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Abstract. Relational data exchange is the problem of translating relational data from 
a source schema into a target schema, according to a specification of the relationship 
between the source data and the target data. One of the basic issues is how to answer 
queries that are posed against target data. While consensus has been reached on the 
definitive semantics for monotonic queries, this issue turned out to be considerably more 
difficult for non-monotonic queries. Several semantics for non-monotonic queries have been 
proposed in the past few years. 

This article proposes a new semantics for non-monotonic queries, called the GCWA'- 
semantics. It is inspired by semantics from the area of deductive databases. We show that 
the GCWA*-semantics coincides with the standard open world semantics on monotonic 
queries, and we further explore the (data) complexity of evaluating non-monotonic queries 
under the GCWA'-semantics. In particular, we introduce a class of schema mappings for 
which universal queries can be evaluated under the GCWA* -semantics in polynomial time 
(data complexity) on the core of the universal solutions. 



1. Introduction 

Data exchange is the problem of translating databases from a source schema into a target 
schema, whereby providing access to the source database through a materialized database 
over the target schema. It is a special case of data integration |28) and arises in tasks like 
data restructuring, updating data warehouses using ETL processes, or in exchanging data 
between different, possibly independently created, applications (see, e.g., |18|[T0j). Tools for 
dealing with data exchange are available for quite a while |371 \T8\ I34| . Fundamental concepts 
and algorithmic issues in data exchange have been studied recently by Fagin, Kolaitis, Miller, 
and Popa in their seminal paper ^lOj . For a comprehensive overview on data exchange, the 
reader is referred to [10] or any of the surveys \26\ [6l [U U] . 

This article deals with relational data exchange, which received a lot of attention in the 
data exchange community (see, e.g., the survey articles cited above). In this setting, the 
mapping from source data to target data is described by a schema mapping M = (a, r, S) 
which consists of relational database schemas a and r (finite sets of relation names with 
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associated arities), called source schema and target schema, respectively, and a finite set S 
of constraints (typically, sentences in some fragment of first-order logic) which can refer to 
the relation names in a and r. Typical constraints are tuple generating dependencies (tgds), 
which come in two flavors - st-tgds and t-tgds -, and equality generating dependencies (egds). 
For example, st-tgds are first-order sentences of the form Vx, y {^{x, y) — )■ 3zTp{x, z)) , where 
(p is a conjunction of relation atoms over a, and -0 is a conjunction of relation atoms over 
r. Their precise definitions are deferred to Section [2.31 Given a relational database instance 
S over a (called source instance for M), a solution for S under M is a relational database 
instance T over r such that the instance SUT over o" U r that consists of the relations in S 
and T satisfies all the constraints in S. 

An important task in relational data exchange is to answer queries that are posed against 
the target schema of a schema mapping. The answer to a query should be semantically 
consistent with the source data and the schema mapping, that is, it should reflect the 
information in the source instance and the schema mapping as good as possible. Since a 
source instance usually has more than one solution, a fundamental question is: What is the 
semantics of a query, that is, which tuples constitute the set of answers to a query over 
the target schema of a schema mapping and a given source instance? Furthermore, in data 
exchange the goal is to answer queries using a materialized solution, without access to the 
source instancejj This brings us to a second fundamental question: Given a source instance, 
which solution should we compute in order to be able to answer queries? 

Concerning the first question, the certain answers semantics, introduced in [10], has 
proved to be adequate for answering a wide range of queries such as unions of conjunctive 
queries (a.k.a. existential positive first-order queries). Under the certain answers semantics, 
a query q is answered by the set of all tuples that are answers to q no matter which solution 
q is evaluated on. More precisely, the certain answers consist of all those tuples a such 
that q{a) is true in all solutions. Concerning the second question, the universal solutions 
proposed in [lOj have proved to be very useful. Universal solutions can be regarded as 
most general solutions in the sense that they contain sound and complete information. In a 
number of settings, they can be computed efficiently [T0 | [T2 | [TB I l^3 | \8\ 133 1 flSj . It was shown 
that the certain answers to unions of conjunctive queries can be computed by evaluating 
such a query on an arbitrary universal solution, followed by a simple post-processing step 
[10]. Similar results hold for other monotonic queries, like unions of conjunctive queries with 
inequalities [10| [8| [5]. 

For many non-monotonic queries, the certain answers semantics yields results that in- 
tuitively do not seem to be accurate \W\ O [29] o The following example illustrates the basic 
problem: 

Example 1.1. Consider a schema mapping M = ({R},{R'},T,), where R,R' are binary 
relation symbols and S contains the single st-tgd 

e := yx,y{R{x,y)^R'{x,y)). 

Let 5 be a source instance for M where R is interpreted by R := {(a, 6)}. Since schema 
mappings describe translations from source to target, it seems natural to assume that M 
and S together give a complete description of the solutions for S under M, namely that 



A common assumption is that the source instance is not available after the data exchange has been 
performed |10| . 

It was also pointed out in [3] 129] that similar problems arise for the universal solution-based semantics 
from [n]. 
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such a solution contains the tuple {a,b) in R' (as implied by 9 and the tuple (a, 6) in R ), 
but no other tuple (since this is not implied by M and S). In particular, it seems natural to 
assume that the instance T which interprets R' by the relation {(0,6)} is the only solution 
for S under M, and that the answer to the query 

q{x, y) := R'{x, y) A Vz {R'{x, z) ^ z = y) 

with respect to M and S is {(0,6)}. However, the certain answers to q with respect to M 
and S are empty. 

The assumption that a schema mapping M and a source instance S for M give a com- 
plete description of the solutions for S under M corresponds to the closed world assumption 
(CWA) [36], as opposed to the open world assumption (OWA) underlying the certain an- 
swers semantics. To remedy the problems mentioned above, Libkin [29j proposed semantics 
based on the CWA, which were later extended to a more general setting |23| (a combined 
version of |29| and |23| appeared in [22]). While the CWA-semantics work well in a number 
of situations (e.g., if a unique inclusion-minimal solution exists), they still lead to counter- 
intuitive answers in certain other situations j301 [2]. To this end, Libkin and Sirangelo |30) 
proposed a combination of the CWA and the OWA, whereas Afrati and Kolaitis [2] studied a 
restricted version of the CWA-semantics, and showed it to be useful for answering aggregate 
queries. Henceforth, we use the term non-monotonic semantics to refer to the semantics 
from |22 | I30 | [2]. In contrast, we call the certain answers semantics OWA-semantics. 

A drawback of the non-monotonic semantics is that most of them are not invariant 
under logically equivalent schema mappings. That is, they do not necessarily lead to the 
same answers with respect to schema mappings specified by logically equivalent sets of 
constraints (see Section [3]). Since logically equivalent schema mappings intuitively specify 
the same translation of source data to the target schema, it seems natural, though, that the 
answer to a query is the same on logically equivalent schema mappings. Furthermore, it 
can be observed that the non-monotonic semantics do not necessarily reflect the standard 
semantics of first-order quantifiers (see Section [3]). For example, consider a schema mapping 
M = {{P} , {Q} , {0}) with 9 = \lx{P{x) — )■ 3yQ{x,y)), and let 5 be a source instance for 
M with a single element a in P. Under almost all of the non-monotonic semantics, the 
answer to the query q = "Is there exactly one y with Q{a,y)?" is true. However, existential 
quantification 3yQ(x,y) is typically interpreted as: there is one y with Q(x,y), or there are 
two y with Q(x,y), or there are three y with Q(x,y), and so on. To be consistent with this 
interpretation, the answer to q should be false, as otherwise the possibility of having two or 
more y with Q{x, y) is excluded. Another reason for why it is natural to answer q by false is 
that 9 can be expressed equivalently as 9' = \/x{P{x) — )• \J ^ Q{x, c)), where c ranges over all 
possible values. Since M and M' = {{P}, {Q}, {9'}) are logically equivalent, the answer to 
q should either be true or false with respect to both M and M'. Letting the answer be true 
would not reflect the intended meaning of the disjunction in 9' , unless we wish to interpret 
disjunctions exclusively. 

This article introduces a new semantics for answering non-monotonic queries, called 
GCWJt -semantics, that is invariant under logically equivalent schema mappings, and intu- 
itively reflects the standard semantics of first-order quantifiers. The starting point for the 
development of the GCWA*-semantics is the observation that query answering with respect 
to schema mappings is very similar to query answering on deductive databases |13) (see Sec- 
tion H]), and that non-monotonic query answering on deductive databases is a well-studied 
topic (see, e.g., |36 | I35 | l38 l 171 113 1 [9] ) . Many of the query answering semantics proposed in this 
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area can be applied with minor modifications to answer queries in relational data exchange. 
Therefore, it seems obvious to study these semantics in the context of data exchange. This 
is done in Section HI More precisely, we consider the semantics based on Reiter's CWA p36] , 
the generalized CWA (GCWA) [SSJ, the extended GCWA (EGCWA) [3.8J, and the possible 
worlds semantics (PWS) [?]• It turns out that the semantics based on Reiter's CWA and 
the EGCWA are too strong, the GCWA-based semantics is too weak, and the PWS is not 
invariant under logically equivalent schema mappings. On the other hand, the GCWA-based 
semantics seems to be a good starting point for developing the GCWA*-semantics. 

In contrast to the other non-monotonic semantics, the GCWA*-semantics is defined 
with respect to all possible schema mappings. It is based on the new concept of GCWjt- 
solutions, in the sense that, under the GCWA*-semantics, the set of answers to a query 
q{x) with respect to a schema mapping M and a source instance S consists of all tuples a 
such that q{a) holds in all GCWA*-solutions for S under M. GCWA*-solutions have a very 
simple definition in many of the settings considered in the data exchange literature (e.g., 
with respect to schema mappings specified by st-tgds and egds): in these settings they are 
basically unions of inclusion-minimal solutions. 

The major part of this article deals with the data complexity of evaluating queries under 
the GCWA*-semantics. Data complexity here means that the schema mapping and the query 
are fixed (i.e., they are not part of the input). We show that the GCWA*-semantics and 
the OWA-semantics coincide for monotonic queries (Proposition 16. l( l. so that all results on 
evaluating monotonic queries under the OWA-semantics carry over to the GCWA*-semantics. 
On the other hand, there are simple schema mappings (e.g., schema mappings specified by 
LAV tgds), and simple non-monotonic Boolean first-order queries for which query evaluation 
under the GCWA*-semantics is co-NP-hard or even undecidable (Propositions 16^2] and [6^ . 

The main result (Theorem 16. 6p shows that universal queries (first-order queries of the 
form Vx C/9 with ip quantifier-free) can be evaluated in polynomial time under the GCWA*- 
semantics, provided the schema mapping is specified by packed st-tgds, which we introduce 
in this article. Packed st-tgds are st-tgds of the form \/xiy{^p{x,y) — >■ 3z^(x,z)), where 
every two distinct atomic formulas in ip share a variable from z. This is a rather strong 
restriction, but still allows for non-trivial use of existential quantifiers in st-tgds. Surprisingly, 
the undecidability result mentioned above involves a schema mapping defined by packed st- 
tgds, and a first-order query starting with a block of existential quantifiers and containing 
just one universal quantifier. The main result does not only state that universal queries 
can be evaluated in polynomial time under the GCWA*-semantics and schema mappings 
defined by packed st-tgds, but it also shows that the answers can be computed from the core 
of the universal solutions [core solution, for short), without access to the source instance. 
The core solution is the smallest universal solution and has been extensively studied in the 
literature (see, e.g., |12^ [T6l [22l |8l |33l IS]). Furthermore, since the core solution can be 
used to evaluate unions of conjunctive queries under the OWA-semantics, we need only one 
solution, the core solution, to answer both types of queries, unions of conjunctive queries 
and universal queries. 

The article is organized as follows. In Section [2l we fix basic definitions and mention 
basic results that are used throughout this article. Section [3] shows that the previously 
proposed non-monotonic semantics are not necessarily invariant under logically equivalent 
schema mappings, and that they do not necessarily refiect the standard semantics of first- 
order quantifiers. In Section [H we then study several of the query answering semantics 
for deductive databases in the context of data exchange. The new GCWA*-semantics is 
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introduced and illustrated in Section [5l and the data complexity of answering queries under 
the GCWA*-semantics is explored in Section [6j 

2. PRELIMINARIES 

We use standard terminology from database theory, but slightly different notation. See, e.g., 
[1] for a comprehensive introduction to database theory. 

2.1. Databases. A schema is a finite set a of relation symbols, where each R G a has a 
fixed arity ar(i?) > 1. An instance I over a assigns to each R G a a finite relation R of 
arity &t{R). The active domain of I (the set of all values that occur in I) is denoted by 
dom(I). As usual in data exchange, we assume that dom(/) C Dom, where Dom, is the 
union of two fixed disjoint infinite sets - the set Const of all constants, and the set Null of 
all (labeled) nulls. Constants are denoted by letters a,b,c,... and variants like a',ai. Nulls 
serve as placeholders, or variables, for unknown constants; we will denote them by _L and 
variants like _L',_Li. Let const(/) := dom(I) n Const and nulls(/) := dom(/) fl Null. An 
instance is called ground if it contains no nulls. 

An atom is an expression of the form R{t), where i? is a relation symbol, and t G 
Donf"^^ ' . We often identify an instance / with the set of all atoms R{t) with i £ R , that 
is, we often view / as the set {R{t) \ R £ a, i £ R }. An atom R(t) is called ground if t 
contains no nulls. 

We extend mappings f: X ^ Y, where X and Y are arbitrary sets, to tuples, atoms, 
and instances as follows. For a tuple t = (ti, . . . ,tn) G X^, we let f{t) := (/(ti), • • • , f{tn))] 
for an atom A = R{t), we let f{A) := R{f{t)); and for an instance /, we let /(/) : = 
{f{A) I A G /}. A mapping f : X ^^ Y is called legal for an instance / if dom(/) C X, and 
/(c) = c for all c € const (/). The set of all mappings that are legal for / is denoted by 
legal(I). For a tuple i = (ti, . . . , t^), we sloppily write f : t ^ Y for a mapping f : X ^ Y 
with X = {ti, . . . , tk}- Given i = (ti, . . . , t^), u = {ui, . . . ,ui), and an element v, we also 
write V € i if V £ {ti, . . . ,tk}, and we let tfl u := {ti, . . . , t^} fl {ui, . . . ,ui}. 

Let / and J be instances. A homomorphism from / to J is a mapping h: dom(/) — )• 
dom( J) such that h E legal{I) and h{I) C J. If h{I) = J, then J is called a homomorphic 
image of I. I and J are homomorphically equivalent if there is a homomorphism from / to 
J, and a homomorphism from J to /. An isomorphism, from / to J is a homomorphism h 
from I to J such that h is bijective, and h~^ is a homomorphism from J to /. If there is an 
isomorphism from / to J, we call / and J isomorphic, and denote this by / = J. A core is 
an instance K such that there is no homomorphism from if to a proper subinstance of K. 
A core of / is a core K <^ I such that there is a homomorphism from I to K. It is known 
|19j that if / and J are homomorphically equivalent, K is a, core of /, and K' is a core of J, 
then K = K' . In particular, any two cores of I are isomorphic, so that we can speak of the 
core of /. 

Given some property P of instances, a minimal instance with property P is an instance 
/ with property P such that there is no instance J (^ I with property P. 
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2.2. Queries. As usual [JLJ, a k-ary query over a schema o" is a mapping from instances over 
a to Dom that is C-generic for some finite set C C Const (i.e., the query is invariant under 
renamings of values in Dom\ C). In the context of queries defined by logical formulas, we 
will often use the words formula and query as synonyms. Whenever we speak of a first-order 
formula (FO-formula) over a schema a, we mean a FO formula over the vocabulary that 
consists of all relation symbols in a^ and all constants in Const. 

Let if he a FO-formula over a, and let dom((/3) be the set of all constants in (/?. An 
assignment for ip in an instance / is a mapping from the free variables of ip to dom(/) U 
dom((/9), which we extend to Const via a{c) := c for all c € Const. We write / \= (/^(a) to 
indicate that 99 is satisfied in I under a in the naive senseo The relation \= is defined as usual, 
the only difference being that constants in ip are interpreted by themselves, and quantifiers 
range over dom(/)Udom((/3). That is, we apply the active domain semantics [1\. For example, 
we have I \= i?(Mi, . . . , Uar(R))(a) precisely if (a(Mi), . . . ,a(uar(ij))) e R^; I \= (ui = U2){a) 
precisely if a{ui) = 0(^2); and / \= {3xip){a) precisely if there is an w € dom(/) U doin(ip) 
with / \= ip(a[v/x]), where a[f/x] is the assignment defined like a, except that x is mapped 
to V. For an FO-formula (^(xi, . . . , x^) and a tuple u = {ui, . . . , uj.) G (dom(/) U dom.{ip))^ , 
we often write I \= ip{u) instead of I ^ V^{ct)j where a(xj) = Uj for each i G {1, . . . , k}. 

A query q{x) over a is monotonic if q{I) C q{J) for all instances I, J over a with I (^ J. 
It is easy to see that all queries preserved under homomorphisms are monotonic. Here, a 
query q{x) over a is preserved under homomorphisms if and only if for all instances /, J 
over o", all homomorphisms h from / to J, and all tuples i € Qil), we have h{t) € q{J). 
For example, conjunctive queries, unions of conjunctive queries, Datalog queries, and the 
Datalog '•^^ queries of |5] are preserved under homomorphisms. Unions of conjunctive 
queries with inequalities (see, e.g., [lO] for a definition) are an example of monotonic queries 
that are not preserved under homomorphisms. 

At various places in sections [SHSl we will need formulas of the infinitary logic Lqo^. 
A Laouj formula over a schema a is built from atomic FO formulas over a using negation, 
existential quantification, universal quantification, infinitary disjunctions \/ ^, where $ is 
an arbitrary set of Lqow formulas over o", and infinitary conjunctions /\<5, where $ is an 
arbitrary set of Laouj formulas over a. The semantics of infinitary disjunctions and infinitary 
conjunctions is the obvious one: for an assignment a of the variables that occur in the 
formulas in <I> we have I \= \/ $(a) if and only if there is some (/?€'!> with / |= (p{a), and 
I \= f\ ^{a) if and only if for all c^ G $, I ^ fia)- 

2.3. Data Exchange. A schema mapping M = (a, r, S) consists of disjoint schemas a and 
r, called source schema and target schema, respectively, and a finite set S of constraints in 
some logical formalism over a U r (TO] . To introduce and to study query answering semantics 
in a general setting, we assume that for all schema mappings M = (cj, r, S) considered in 
this article, E consists of Lqow sentences over a L) t (that are C-generic for some finite 
C C Const). For algorithmic results, however, we restrict attention to schema mappings 
M = (a, r, E), where S consists of source-to-target tuple generating dependencies (st-tgds) 
and equality generating dependencies (egds), which have been prominently considered in data 



Nulls that may occur in / are treated as if they were constants. In general, this may lead to counter- 
intuitive semantics [311125) . since distinct nulls may represent the same constant. 
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exchange. Here, an st-tgd is a FO sentence of the form 

\/xiy{ip{x,y) — ;■ 3z'(/'(x,z)), 

where 99 is a conjunction of relational atomic FO formulas over a with free variables xy, and 
■0 is a conjunction of relational atomic FO formulas over r with free variables xz. A full 
st-tgd is a st-tgd without existentially quantified variables z, and a LAV tgd is a st-tgd with 
a single atomic formula in 93. An egd is a FO sentence of the form 

'ix(^Lp{x) — > Xj = Xj), 

where c/9 is a conjunction of relational atomic FO formulas over r with free variables x, and 
Xi,Xj are variables in x. 

Let M = (a, r, S) be a schema mapping. A source instance for M is a ground instance 
over a, and a target instance for M is an instance over r. Given a source instance S for M, 
a solution for S" under M is a target instance T for M such that S U T |= S, that is, the 
instance S UT over o" U r satisfies all the constraints in S. 

A universal solution for S* under M is a solution T for S* under M such that for all 
solutions T' for S* under M there is a homomorphism from T to T'. Note that all universal 
solutions for S under M are homomorphically equivalent, which implies that their cores 
are isomorphic. Hence, up to isomorphism there is a unique target instance, denoted by 
Core(M, S), that is isomorphic to the cores of all universal solutions for S under M. For 
many schema mappings M, Core(M, S) is a solution for S under M. For example, if E 
contains only st-tgds, then Core(M, S") is a solution for S under M |12) . which can be 
computed in polynomial time: 

Theorem 2.1 ([12j). Let M = (u, r, E) be a schema mapping, where E consists of st-tgds. 
Then there is a polynomial time algorithm that, given a source instance S for M , outputs 
Core(M,S). 

Besides Core(M, S"), the canonical universal solution for S under M, which is denoted 
by CanSol(M, S"), plays an important role in data exchange. In the following, we give the 
definition of CanSol(M, S) from [3j for the case that E contains only st-tgds. Let J' be the 
set of all triples (9, a, b) such that is a st-tgd in E of the form VxVy(99(x, y) -^ 3ziIj{x, z)), 
and S \= ip{a,b). Starting from an empty target instance for M, CanSol(M, S) is created 
by adding atoms for each element in J as follows. For each j = (9, a, b) € J7, where 
6 = VxVy((/?(x, y) -^ 3zijj{x, z)), let ij be a |z|-tuple of pairwise distinct nulls such that for 
all j' G J' with j' 7^ j, the set of nulls in ij is disjoint from the set of nulls in J_j/, and add 
the atoms in ^{a, J-j) to the target instance. 

2.4. The Certain Answers. Given a query q{x) over r, and a set T of instances over r, 
we define the certain answers to q{x) on T by 



cer\ 



Kq,T) :=^{q{T)\TeT}. 



The set of the certain answers to q{x) on M and S under the OWA-semantics, as defined in 
lilOl. is then defined as 



certowA{Q,M, S) := cert{q, {T | T is a solution for S under M}). 

If q is preserved under homomorphisms, certo^J^lx{q,M,S) can be computed from a single 
universal solution: 
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Proposition 2.2 ([TOJ). Let M = (cr, r, S) be a schema mapping, let S be a source instance 
for M , let T be a universal solution for S under M , and let q{x) be a query that is preserved 
under homomorphisms. Then 

certowAiQ, M , S) = {a € QiT) \ a contains only constants}. 

3. Review of Non-Monotonic Semantics in Relational Data Exchange 

As mentioned in the introduction, for many non-monotonic queries, the OWA-semantics is 
counter-intuitive. In [12j, Fagin, Kolaitis, and Popa propose an alternative semantics, where 
the set of answers to a query q(x) on a schema mapping M and a source instance S is 
defined by cert{q, {T | T is a universal solution for S under M}). However, their semantics 
has similar problems as the OWA-semantics |3l[22]. For example, note that Example II . II goes 
through unchanged if the universal solution-semantics is used instead of the OWA-semantics. 

Libkin [29j realized that the counter-intuitive behavior of the OWA-semantics and the 
universal solution-semantics can be remedied by adopting the CWA. He then introduced 
semantics based on the CWA. These semantics were designed for schema mappings defined 
by st-tgdsObut were later extended to schema mappings defined by st-tgds, t-tgds (see, e.g., 
[lOj for a definition of t-tgds), and egds. 

To define the CWA-semantics, we need to introduce CWA-solutions. For simplicity, we 
give their definition only for schema mappings defined by st-tgds. Let M = (a, r, S) be a 
schema mapping where T, consists of st-tgds, and let 5" be a source instance for M. Libkin 
identified the following requirements that should be satisfied by any CWA-solution T for S 
under M: 

(1) Every atom in T is justified by M and S. 

(2) Justifications are applied at most once. 

(3) Every Boolean conjunctive query true in T is a logical consequence of S and S. 

Here, a justification for an atom consists of an st-tgd \/x\/y{(p(x,y) — )• 3ziIj{x,z)) in E and 
assignments a,b for x,y such that S \= (p{a,b). Such a justification can be applied with 
an assignment u for z, and would then justify the atoms in ip{a,u). We require that all 
atoms that are justified belong to T. Once all requirements are properly formalized, it turns 
out that a CWA-solution for S under M is a universal solution for S under M that is a 
homomorphic image of CanSol(M, 5) [22j . 

We now recall the simplest of the four CWA-semantics introduced in |22) . which we 
henceforth call CWA-semantics. Recall that solutions in data exchange may contain nulls. 
Nulls represent unknown constants, in particular, two distinct nulls may represent the same 
constant. Treating nulls as constants may thus lead to counter-intuitive answers [25) . A 
standard way to answer queries while taking into account the semantics of nulls is to return 
the certain answers. To this end, one views an instance T over r as a set poss{T) of ground 
instances obtained by substituting concrete constants for each null. Formally, we let 

poss{T) := {v(T) I f is a valuation of T}, 



Strictly speaking, Libkin considered sentences of the form \/x [^{x) — > 3zilj{x, y)) , where i/p is a first-order 
formula over the source schema, and ■(/' is a conjunction of relational atoms over the target schema. 
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where a valuation of T is a mapping v: doni(T) — )> Const with v G legal{T). Then the 
certain answers to a query q{x) on T are 

cert{q,T) := cert{q,poss{T)). 

Now, Libkin's idea was to define the CWA-answers to a query q{x) on M and S by 

certc^j^\{q, M, S) := r\{cert{q,T) | T is a CWA-solution for S under M}. 

That is, certcwA{Q,M,S) is the set of the certain answers to q{x) on the CWA-solutions 
for S under M, but instead of answering q(x) on an individual CWA-solution T by q{T), 
the certain answers are used. As shown in [22], certcwfji^{q, M, S) can be computed from the 
canonical solution: certQ^^{q,M,S) = ceri(g, CanSol(M, S")). 

While the CWA-semantics works well in a number of situations (e.g., for schema map- 
pings defined by full st-tgds, full t-tgds and egds), and in particular resolves the problems 
observed in [3i i22j and Example ll.H one of the drawbacks of the CWA-semantics is that 
it is not invariant under logically equivalent schema mappings. That is, there are schema 
mappings Mi = (u, r, Si) and M2 = (o", r, S2), where Si is logically equivalent to S2, and a 
query q{x) such that the answers to q{x) with respect to Mi differ from the answers to q{x) 
with respect to M2: 

Example 3.1. Let Mi = (u, r, Si) and M2 = (a, r, S2) be schema mappings, where a 
contains a unary relation symbol P, r contains a binary relation symbol E, and 

Si := {ix{P{x) ^ E{x,x))}, 

S2 := SiU{Vx(P(x) ^3zE{x,z))]. 

Then Mi and M2 are logically equivalent. 

Let S be an instance over a with P^ = {a}. Furthermore, let Ti and T2 be instances 
over T with E'^^ = {{a, a)} and E'^^ = {(a,a), (a, _L)}. Note that Tj = CanSol(Mi, 5) for 
each i G {1,2}. Even more, Ti is the unique CWA-solution for S under Mi, whereas T2 is 
a CWA-solution for S under M2. Thus, for the query 

q{x) := 3z {E{x, z) A \lz'{E{x, z') -^ z' = z)) , 

we obtain different answers certQ^pJ^q^ Mi, S") = {a} and certQ^p^{(i^ M2, /S) = to the same 
query q on logically equivalent schema mappings M\ and M2. 

Remark 3.2. If we replace q by the query (^ which asks for all x such that there are at least 
two z with S(x, z), then g' is answered differently on Mi and M2 with respect to the maybe 
answers-semantics from |22) . The other two semantics in |22) are invariant under logically 
equivalent schema mappings, though. 

Intuitively, logically equivalent schema mappings specify the same translation of source 
data to the target schema, so it seems natural that the answer to a query is the same on 
logically equivalent schema mappings. Furthermore, invariance under logically equivalent 
schema mappings seems to be a good way to achieve a "syntax-independent" semantics, that 
is, a semantics that does not depend on the concrete presentation of the sentences of the 
schema mapping like the CWA-semantics. Gottlob, Pichler, and Savenkov [17J suggest a 
different approach for achieving syntax-independence that is based on first normalizing a 
schema mapping, and then "applying" the semantics. Let me mention that weaker notions of 
equivalence between schema mappings have been considered in the literature |11| . Instead 
of requiring invariance under logical equivalence, one could use any of these weaker notions. 
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There is another drawback of the CWA-semantics, though: it does not necessarily re- 
flect the standard semantics of FO quantifiers. Typically, existential quantification 3x (p is 
interpreted as: there is one x satisfying (/j, or there are two x satisfying ip, or there are three 
X satisfying if, and so on. But this is not necessarily reflected by the CWA-semantics: 

Example 3.3. Let M = (a, r, S) be the schema mapping, where a contains a unary relation 
symbol P, r contains a binary relation symbol E, and S consists of the st-tgd 

e := Vx(P(x) ^3zE{x,z)). 

Let S he the source instance for M with P^ = {a}, and let T := CanSol(M, S"). Then up 
to renaming of nulls, T is the unique CWA-solution for S under M, and E^ = {(a,±)}. 
Therefore, for the query 

q{x) := 3z {E{x, z) A Vz' {E{x, z') ^ z' = z)) , 

we have certcwAiQ,M,S) = {a}. In other words, certcwAiQ,^,S) excludes the possibility 
that there is more than one value z with E{a,z). However, this seems to be too strict since 
if we interpret 3zE{x,z) in the standard first-order way, S and 6 tell us that there is one 
z satisfying E{a,z), or there are two z satisfying E{a,z), or there are three z satisfying 
E{a,z), and so on. In particular, they explicitly state that it is possible that there is more 
than one z satisfying E{a, z). 

Remark 3.4. The example also shows that the potential certain answers semantics from 
|22j does not necessarily reflect the standard semantics of FO quantifiers. For the remaining 
two semantics in |22j . we can replace q by the query q' asking for all x for which there are 
at least two z with E(x,z). Then the set of answers to q' under those semantics will be 
empty. In other words, they tell us that it is not possible that there are more than two z 
that satisfy E{x,z), which is not desired. 



In |30| , Libkin and Sirangelo proposed a generalization of the CWA-semantics based on 
a combination of the CWA and the OWA, using which we can resolve the problem described 
in Example 13.31 The idea is to annotate each position (occurrence of a variable) in the head 
of an st-tgd as closed or open, where open positions correspond to those positions where 
more than one value may be created. For example, recall the schema mapping M and the 
source instance S from Example 13.31 and let a be the following annotation of 0: 

Oa := yx (P(x) ^ 3zE{x^^""''^,z°P^'')). 

Then the valid solutions for S under M and a are all solutions for S under M of the form 
{E{a,b) I b £ X}, where X is a finite set of constants. That is, the first position - which 
corresponds to the closed position in the head of 6a - is restricted to the value a assigned to 
X when applying 6 to S, while the second position - which corresponds to the open position 
in the head of 6^ - is unrestricted in the sense that an arbitrary finite number of values may 
be created at this position. Let us write so1q,(M, 5) for the set of all these solutions. The 
set of answers to a query q(x) on M, a and S is then 

certa{q, M, S) := cert{q, so1q,(M, S)). 

In particular, it is easy to see that for the query q in Example l3.3l we have certa{q, M, S) = 0, 
as desired. However, the "mixed world" semantics may still be counter-intuitive: 
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Example 3.5. Let M = {a, r, S) be defined hy a = {R}, r = {E, F}, and S = {9}, where 
e := yx,y (R{x,y) ^ 3z {E{x,z) AF{z,y))). (3.1) 



Intuitively, 9 states that "if R{x, y), then there is at least one z such that E{x, z) and F{z, y) 
hold." There could be exactly one such z, but there could also be more than one such z. The 
possibility that there are precisely two such z, or precisely three such z et cetera is perfectly 
consistent with 9, and should not be denied when answering queries. So, it seems that the 
only solutions for S = {R{a,b)} under M should be those solutions T for which there is a 
finite set X C Const such that T = {E(a, x) | x G X} U {F{x, 6) | x € X}. In particular, we 
should expect that the answer to 

q{x,y) := 3=^z {E{x, z) A F{z,y)) 

on M and S is empty, and that the answer to 

q'{x) :=yz{E{x,z)^3yF{z,y)) 

on M and S is {a}. 

Now consider an annotation a for 9 where the occurrence of z in E{x, z) is closed. 
According to the definition in [30j, for all valid solutions T for S under M and a, and 
all tuples [c,d),{c' ,d') G E we have d = d', so that certa{q,M, S) = {(a, 6)}. Hence, 
certa(q,M,S) excludes the possibility that there is more than one z satisfying E{x,z) and 
F(z, y), although 9 explicitly states that it is possible that more than one such z exists. The 
same is true if the occurrence of z in F(z,y) is closed. 

Let us finally consider an annotation a' where both occurrences of z in the head of 9 
are open. According to the definition in [30] . we have certa' (q' , M , S) = 0, since 

T* := {E{a,c),E{a,c'),F{c,b)} 

would be a valid solution for S under M and a' according to this definition. Intuitively, 
the "mixed world" semantics is "too open" in that it allows E{a, c') to occur in T* without 
enforcing that the corresponding atom F{c',b) is present in T*. 



Remark 3.6. The existential quantifier in (|3.ip can be expressed via an infinite disjunction 
over all possible choices of values for z (recall that nulls are just place-holders for unknown 
constants, so we do not have to consider nulls here), as in 

9' ■.= yx,y(R{x,y)^ V (^(^, c) A F(c,y)) J . (3.2) 

\ c£ Const / 

The interpretation of the existential quantifier discussed in Example 13.51 then corresponds to 
an inclusive interpretation of the disjunction in (13. 2p . Intuitively, the desired set of solutions 
for S = {R{a,b)} under M - the set of solutions of the form T = {E{a,x) \ x S X} U 
{F(x, b) \ X £ X} for some finite set X C Const - is the smallest set of solutions that 
reflects an inclusive interpretation of the disjunction in ()3.2p . 



Remark 3.7. Afrati and Kolaitis [2j showed that a restriction of Libkin's CWA is useful 
for answering aggregate queries. Their semantics is defined with respect to schema map- 
pings specified by st-tgds. In principle, we could use it to answer non-aggregate queries as 
follows. Let M = (cr, r, S) be a schema mapping where S is a set of st-tgds, let S be a 
source instance for M, and let q(x) be a query over r. Instead of answering q{x) by the 
certain answers to q{x) on poss(Core(-M, S)), we answer q{x) by the certain answers to q{x) 



12 A. HERNICH 



on the set of all endomorphic images of CanSol(M, S). Here, an endomorpliic image of 
CanSol(M, S) is an instance T such that T = /i(CanSol(M, 5)) for some homomorphism h 
from CanSol(M, 5) to CanSol(M, 5). However, the endomorphic images-semantics seems 
to be too strong - it is stronger than the CWA-semantics. In particular. Example 13. II shows 
that the endomorphic images-semantics is not invariant under logically equivalent schema 
mappings, and Example 13. 3l shows that it does not necessarily reflect the standard semantics 
of FO quantifiers. 

Our goal is to develop a semantics for answering non-monotonic queries that is invariant 
under logically equivalent schema mappings, and is just "open enough" to interpret existential 
quantifiers (when viewed as an infinite disjunction, as suggested in Remark 13. 6p inclusively. 
We start by studying semantics for answering non-monotonic queries on deductive databases. 

Remark 3.8. Note the following side-effect of invariance under logical equivalent schema 
mappings. Consider the schema mappings Mi, M2 and source instance S from Example 13. 11 
Then under a reasonable closed world semantics, T = {E{a,a)} should be the only "valid 
solution" for S under Mi. Invariance under logical equivalence thus enforces that T is also 
the only "valid solution" for S under M2. This seems to be counter-intuitive, but only as 
long as one considers the two st-tgds in M2 isolated. 



4. Deductive Databases and Relational Data Exchange 

Query answering on deductive databases and query answering in relational data exchange 
are very similar problems. Both require answering a query on a ground database that is 
equipped with a set of constraints. On the other hand, answering non-monotonic queries 
on deductive databases is a well-studied topic. In this section, we translate some of the 
semantics that were proposed to answer non-monotonic queries on deductive databases into 
the context of relational data exchange. 

A deductive database [13j over a schema cr is a set of FO sentences, called clauses, of 
the form 

Vx(-i?l(yi) V ■ • • V ^RmiVm) V R'lizi) V • • • V R'niZn)), (4.1) 

where m and n are nonnegative integers with m + n > 1, Ri, . . . , Rm ,Ri,---, R'n are relation 
symbols in a, and yi, . . . , y^, zi, . . . ,Zn are tuples containing elements of Const and x. A 
model of a deductive database D over o" is a ground instance / over a with I \= D (i.e., I 
satisfies all clauses in D), and a query q{x) is usually answered by cert(q,I), where Z is a 
set of models of D that depends on the particular semantics. 

Several semantics for answering non-monotonic queries on deductive databases were 
proposed (see, e.g., |36| l35| l38l [7], and the survey articles |13| |9]). Often, these semantics 
can be applied with some minor modifications to more general sets of logical sentences such 
as 

Dm,s := S U {R{i) \Rea,i£R^}U {^R{t) \ R e a, i e Consf'^^^ \ R^}, 

where M = {a, t, S) is a schema mapping, and 5 is a source instance for M. Note that, if S 
consists only of full st-tgds, then Dm,s is logically equivalent to a deductive database, since 
any full st-tgd of the form \/x{Ri{yi) A • • • A Rm{ym) — ^ R'{^)) is logically equivalent to the 
clause yx{^Ri{yi) V • • • V ^R„,{ym) V R'{z)). 
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In the following, we concentrate on Reiter's CWA |36j, since this is the basic assump- 
tion underlying all previous approaches for non-monotonic query answering in relational 
data exchange. However, we also consider variants of Reiter's CWA such as semantics 
based on Minker's generalized CWA (GCWA) |35j, Yahya's and Henschen's extended GCWA 
(EG CWA) [38] as well as Chan's possible worlds semantics (PWS) [7]. 

4.1. Reiter's Closed World Assumption (RCWA). Reiter's closed world assumption 
(RCWA)^ formalized by Reiter in [36j, assumes that every ground atom that is not implied 
by a database is false. This is a common assumption for relational databases. 

Reiter formalized the RCWA as follows. For a deductive database D and a formula ip, 
we write D \= (f ii and only if for all instances I with I \= D, we have I \= if. Given a 
deductive database D over a schema a, let 

D := {^R{i) \Rea,ie Consf'^^\ D ^ R{i)}, 

which contains negations of all ground atoms R{t) (i.e., t is a tuple of constants) that are 
assumed to be false under the RCWA. The models oi DU D are called RCWA-models of D. 
Under the RCWA, a query q(x) over a is answered by cert{q,I), where I is the set of all 
RCWA-models of D. 

Translated into the relational data exchange framework, we obtain: 

Definition 4.1 (RCWA-solution, RCWA- answers). Let M = {a, r, S) be a schema mapping, 
let S be a source instance for M, and let q{x) be a query over r. 

(1) An RCWA-solution for S under M is a ground target instance T for M such that SUT 
is a RCWA- model of Dm,s (recall the definition of Dm,s from (|4.ip ). 

(2) We call cerfRcWA(^; M, S) := cert{q,I), where I is the set of all RCWA-solutions for S 
under M, the RCWA-answers to q{x) on M and S. 

Note that RCWA-solutions are ground (i.e., contain no nulls), in contrast to other notions of 
solutions such as plain solutions, universal solutions, or CWA-solutions presented in previous 
sections. 

The RCWA is a very strong assumption. For example, if an RCWA-solution for S under 
M exists, it is the unique minimal (ground) solution for S under M: 

Proposition 4.2. Let M be a schema mapping, and let S be a source instance for M . 
Then a solution T for S under M is an RCWA-solution for S under M if and only ifT is 
contained in every ground solution for S under M . 

Proof. Suppose T is an RCWA-solution for S under M, and let T' be a ground solution for 
S under M. If there is an atom RiJ) € T \ T', then Dm,s V^ R(J); so that ^R{t) G Dm,Si 
and hence T is no RCWA-solution for S under M. Therefore, T C T' . To prove the other 
direction, suppose that T is contained in every ground solution for S under M. Then, for 
all atoms R(t) G T we have Dm,s \= -R(^) so that T is an RCWA-solution for S under M. □ 



We write RCWA instead of CWA to avoid confusion with Libkin's formalization of the CWA. 
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It is also not hard to see that ceriRcwA coincides with certcwA on schema mappings 
defined by fuh st-tgds. 

Proposition 4.3. Let M = {a,T,Y,) be a schema mapping, where T, consists of full st-tgds, 
let S be a source instance for M , and let q{x) be a query over r. Then, certj^cWAiQ, M, S) = 
certcwA{q,M,S). 

Proof. Since T, consists of full st-tgds, there is a unique minimal ground solution Tq for 
S under M, which is also the unique CWA-solution for 5 under M, and, by Proposi- 
tion |3]2l the unique RCWA-solution for S under M. Consequently, cert^^cY/AiQ,M,S) = 
certcwAiq, M, S) = q{To). D 

However, for schema mappings that contain non-full st-tgds, cer^RcWA may lead to 
answers that are inconsistent with M and S. This is illustrated by the following example, 
which is based on Example 8 in [36j. 

Example 4.4. Let M = ({P}, {£;}, S), where S := {Vx(P(x) -^ 3zE{x,z))}, and let S 
be the source instance for M with P = {a}. Since there is no unique minimal ground 
solution. Proposition 14.21 implies that there is no RCWA-solution for S under M. Conse- 
quently, the RCWA-answers to q{x) := 3zE{x,z) on M and S are empty. In other words, 
cert^^CWAiQ, M , S) tells us that there is no value z satisfying E{a,z). This is clearly incon- 
sistent with M and S", which tell us that there is a value z satisfying E{a, z), and hence that 
the set of answers should be {a}. 

4.2. The Generalized Closed World Assumption (GCWA). Minker [35] extended 
Reiter's CWA to the generalized closed world assumption (GCWA) as follows. Recall the 
definition of minimal instance possessing some property from Section[2l Let Dhea, deductive 
database over a schema a. A minimal model of D is a minimal instance with the property 
of being a model of D. Let 

V := {^R{t) \Rea,ie Consf'^^\ i ^ R^ for all minimal models / of D}, 

which, analogous to D for the case of the RCWA, contains negations of all ground atoms that 
are assumed to be false under the GCWA. The models of DU-D are called GCWA-models of 
D, and a query q{x) over a is answered by cert{q,I), where I is the set of all GCWA-models 
of D. 

The intuition behind the above definitions is that each ground atom in some minimal 
model of D is in some sense an atom that D "speaks" about. For ground atoms that do 
not occur in any minimal model of D, this means that they are merely "invented", and can 
therefore safely be assumed to be false. 

Translated into the relational data exchange framework, we obtain: 

Definition 4.5 (GCWA-solution, GCWA-answers). Let M = {a, r, S) be a schema mapping, 
let S be a source instance for M, and let q{x) be a query over r. 

(1) A GCWA-solution for S under M is a ground target instance T for M such that S UT 
is a GCWA-model of Dm,s- 

(2) We call certQcwAiQ, M , S) := cert{q,I), where X is the set of the GCWA-solutions for 
S under M, the GCWA-answers to q{x) on M and S. 

We have the following characterization of GCWA-solutions: 
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Proposition 4.6. Let M be a schema mapping, and let S be a source instance for M . Then 
a solution T for S under M is an GCWA-solution for S under M if and only if there is a 
set T of minimal ground solutions for S under M such that T C |J 7~. 

Proof. Suppose T is a GCWA-solution for S under M. Then for each atom A in T, there is 
a minimal ground solution Ta for S under M with A G Ta (otherwise, ^A E Dm,Si so that 
T would not be a GCWA-solution for S under M). But then we have T C (J^gj^Ty^. 

Suppose now that T is a set of minimal ground solutions for S under M such that 
T C y 7". Then, for all atoms A € T we have ^A ^ Dm,s-> and it follows that T is a 
GCWA-solution for S under M. ' D 

Similar to the RCWA-answers semantics, it can be shown that certQQ-sj^x coincides with 
certcwA on schema mappings defined by full st-tgds. Moreover, cericcWA leads to the 
desired answers to the query in Example 14.41 

Example 4.7. Recall the schema mapping M, the source instance S, and the query q from 
Example 14.41 We now have 



Dm,s = {^P{b) ! b e Const, b ^ a} U {^E{b, c) [ 6, c € Const, b / a}, 

because each atom of the form E{a,c) is true in some minimal model of Dm,s, and each 
atom of the form E{b,c) with 6 7^ a is false in all minimal models of Dm,s- Therefore, the 
GCWA-solutions for S under M are precisely the target instances T for M for which there 
is a finite nonempty set B C Const with T = Tb, where E^ = {(a, 6) | 6 G B}. It follows 
that certQ(j\]^x{q,M,S) = {a}, as desired. 

Nevertheless, there are cases where the GCWA is still quite unsatisfactory, as shown by 
the following example: 

Example 4.8. Consider a slight extension of the schema mapping from Example 14. 4| namely 
M = ({P], {E, F], S), where S consists of the st-tgd 



e := Vx fP(x) -^ 3zi3z2 {E{x,zi) A F{zi,Z2) 
Let S be the source instance for M with P^ = {a}. Then, 



Dm,s = {^P{b) I b G Const, 6 / a} U {^E{b, c) | 6, c E Const, b ^ a}. 



Note that for all 6, c G Const we have ^F{b,c) ^ -Dj\/,5, since the target instance T for M 
with P = {a}, E = {(a, 6)} and F = {(6, c)} is a minimal model of Dm,s- So, the 
GCWA-solutions for S under M are the target instances T for M for which there is a finite 
nonempty set B C Const with the following properties: (1) E = {(a, 6) | b € B}, and (2) 
for at least one b £ B there is some c G Const with (6, c) G F . In particular, the target 
instance T* with E = {(a, 6)} and F = {(6, c), {d,e)} is a GCWA-solution for S under 
M. For the Boolean query 

q ■■= yziyz2{F{zi,Z2) ^3xE{x,zi)) 

we thus have certQ,Q\js[x{q, M, S) = 0. 

So, certQcWAiQ,M,S) tells us that it is possible that there is a tuple (6, c) in F for 
which (a, b) is not in E. However, 9 and S do not "mention" this possibility. In particular, 
9 and S only tell us that there are one or more pairs (6, c) G Constr such that E{a, b) and 
F{b, c) occur together in a solution. Thus, whenever E{a, b) is present for some b G Const, 
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then F(b, c) should be present for some c G Const. Similarly, whenever F{b, c) is present for 
some 6, c E Const, then E{a, h) should be present. 



4.3. Extensions of the GCWA. Various extensions of the GCWA have been proposed. 
One of these extensions is the extended CCWA (ECCWA) by Yahya and Henschen [38], 
which restricts the set of models of a deductive database D to the minimal models of D. So, 
given a schema mapping M = (a, r, S) and a source instance S for M, an EGCWA-solution 
for S under M can be defined as a ground minimal solution for S under M, and given a 
query q{x) we can define 

ceriEGCWA(Q,M, 5) := cert{q,I), 

where I is the set of all EGCWA-solutions for S under M. Then, for the schema mapping 
M, the source instance S for M, and the query q in Example 14. 7| cert^QcWAiQ^MjS) = 
certQcwfji^{q,M,S), and for the schema mapping M, the source instance S for M, and the 
query q in Example 14.81 certECCWAiQ, M , S) ^ 0, as desired. However, the EGCWA seems 
to be too strong in the sense that it removes too many solutions from the set of all solutions. 
More precisely, it interprets existential quantifiers (when viewed as disjunctions) exclusively 
rather than inclusively. We illustrate this by the following example|j 

Example 4.9. Let M = ({P}, {£^},S) be a schema mapping, where S consists of 

^ = Vx {P{x) -^ 3^^'^hE{x, z)) , 

where B^'^Jz £'(x, z) is an abbreviation for "there exist two or three z such that E(x, z)". Let 
S be the source instance for M with P^ = {a}. Then the minimal solutions for S under M 
have the form {E{a,bi),E{a,b2)}, where 61,62 are distinct constants. Thus, for 

q{x) := 32:132:2 \^E{x, zi) A E{x, 22) A V23 {E{x, 23) -^ (23 = 21 V 23 = 22)) j , 

we have certECCWAiQ, M, S) = {a}. In other words, the answer certEGCWA{q, M, S) excludes 
the possibility that there are three distinct values 61, 62, 63 with E{a, hi) for each i G {1, 2, 3}. 
But 9 and S explicitly mention this possibility. Thus, intuitively, cer^EGCWA is inconsistent 
with M and S. 

To conclude this section, let us consider the possible worlds semantics (PWS) by Chan 
[7]. A natural translation of the PWS for the case of schema mappings defined by st-tgds 
is as follows: Let M = (a, r, S) be a schema mapping, where S is a set of st-tgds, and let 
5" be a source instance for M. The definition of a PWS-solution for S under M can be 
given in terms of justifications, as in [22j. Given a target instance T for M and an atom 
R{t) € T, we say that R{i) is justified in T under M and S if and only if there is a st-tgd 
\lx\ly{(p{x,y) — 7> 3ztp[x,z)) in S, tuples a,b over dom(S') with S \= (p{a,b), and a tuple u 
over dom(T) such that T \= ip{a, u), and R{t) is one of the atoms in 'ip{a, u). A PWS-solution 
for S under M is then a ground solution T for S under M such that all atoms in T are 
justified in T under M and S. For a query q over r, we let 

certpws{q,M,S) := cert{q,I), 



Example 13 . 31 illustrates this as well, but Example l4.9l seenis to make it more clear why it may be desirable 
to interpret existential quantifiers inclusively. 
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where X is the set of ah PWS-solutions for S under M. However, ceripws does not re- 
spect logical equivalence of schema mappings which can be easily verified using the schema 
mapping, the source instance and the query from Example 13.11 

5. The GCWA*-Semantics 

We now introduce the GCWA*-semantics, and argue that it has the desired properties - 
being invariant under logically equivalent schema mappings, and being just "open enough" 
to interpret existential quantifiers inclusively. 

Among the semantics considered in the previous sections, the GCWA-semantics is closest 
to the desired semantics. For instance, consider the schema mapping M = ({P}, {£^}, S), 
where S consists of Vx {P{x) — )• 3z E{x, z)), and the source instance S for M with P^ = {a} 
from Example 14.41 Let T be the set of all GCWA-solutions for S under M. As shown in 
Example 14.71 T consists of all target instances T for Af such that there is a nonempty finite 
set B C Consiwith T = Tb, where E'^^ = {(a, 6) | 6 € B}. The set Tis precisely as we would 
like the set of solutions to be. Intuitively, it precisely captures what is expressed by M and 
S: there is one b € Const satisfying E{a, b), or there are two distinct 6i, 62 G Const satisfying 
E{a,bi) and E(a,b2), or there are three distinct 61,62,63 € Consi satisfying £J(a, 61), -©(0,62) 
and E{a,b2,), and so on. The case that there are n distinct 61, . . . ,6„ G Const such that 
E{a^ bi) holds for each i € {1, . . . , n} is captured precisely by Tg, where B := {61, . . . , 6„}. 
However, as we have argued in Example I4.8| with respect to other schema mappings the 
GCWA is still "too open". 

Note that the set T in the above example is the set of all ground solutions for S under 
M that are unions of minimal solutions. Indeed, it seems to be a good idea to use the set 
of all such solutions as the set of "valid solutions". As we have done in Remark 13. 6| we can 
express the existential quantifier in the st-tgd from Example 14.41 equivalently by an infinite 
disjunction, resulting in the following Looij-sentence: 



V E{x,c)\. 

; Const / 



e' := Vx P{x) -^ 

\ ce( 

Then the ground minimal solutions for a source instance S under M' = {{P}, {E}, {0'}) 
correspond to the disjuncts of the disjunction in 6', and an inclusive interpretation of this 
disjunction is guaranteed by taking all ground solutions that are unions of minimal solutions 
as "valid solutions". 

This can be generalized to other schema mappings defined by st-tgds. For instance, 
recall the schema mapping M, and the source instance 5 for M from Example 14.81 where 
the GCWA is "too open". Again, we can express the st-tgd 9 that defines M by an L^ouj- 
sentence where the existential quantifier in 9 is replaced by an infinite disjunction: 

9' := Vx j P{x) ^ y {E{x, ci) A F(ci, C2)) 

\ ci,C2£ Const 

Then the ground minimal solutions for S under M correspond to the disjuncts of the dis- 
junction in 9', and an inclusive interpretation of this disjunction is guaranteed by taking the 
set T of all ground solutions for S under M that are unions of minimal solutions as "valid 
solutions". That is, we take the set T of all ground target instances T for M such that 
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E = {(a, 6) j (6, c) S F for some c G Consi} and F ^ 0. Indeed, the set of the certain 
answers to the query q from Example 14.81 on T is nonempty, as desired. 

The preceding two examples suggest to answer queries by the certain answers on the 
set of all ground solutions that are unions of minimal solutions. Let us call such solutions 
GCWjf -solutions for the moment: 

Definition 5.1 (working definition). Let M = (u, r, S) be a schema mapping, let S" be a 
source instance for M, and let q{x) be a query over r. 

(1) A GCWA!" -solution for S under M is a ground solution for S under M that is a union 
of minimal solutions for S under M. 

(2) We call certQcWA* (q, M, S) := cert{q,I), where I is the set of the GCWA*-solutions for 
S under M, the GCWjf -answers to q{x) on M and S. 

The definition of GCWA*-solutions and GCWA*-answers already seems to be a good approx- 
imation to the concept of solutions, and query answering semantics, respectively, that we 
would like to have. Immediately from the definitions, we obtain that the GCWA*-answers 
are invariant under logically equivalent schema mappings: 

Proposition 5.2. If Mi = (c, r, Si) and AI2 = (o", r, S2) are logically equivalent schema 
mappings, S is a source instance for Mi and M2, respectively, and q{x) is a query over t, 
then certccwA''iq,Mi,S) = certccwA*{q,M2, S). 

Furthermore, let us generalize the discussion from the beginning of this section to argue 
that GCWA*-solutions and the GCWA*-answers as defined above are suitable for schema 
mappings defined by st-tgds. Let M = {a, r, S) be such a schema mapping. Given a source 
instance S for M, let 

^M,S •= {3z^('U,z) I there are a tgd \lx\/y[ip{x,y) — > 3zijj{x^z)) in S and tuples 
u G Consi^^v G ConsP^ such that S \= ip{u,v)}. 

For each ground target instance T for M, it holds that T is a solution for S under M if and 
only if T satisfies all sentences in ^ m,s- Let To be the set of all ground minimal solutions 
for S under M. Since all sentences in ^ m,s are monotonic, ^ m,s is logically equivalent (on 
the set of all ground instances over r) to the sentence 

^M,s-= V A ^©' 

To £7?) R{t)&To 

that is, for all ground instances T over r, we have T \= ipM,s if a-nd only if T satisfies all 
sentences in ^ m,s- Now, ipM,s tells us that there is one Tq G To such that all -R(t) G To 
are satisfied, or there are two To G To such that all R{t) G Tq are satisfied, and so on. So, 
intuitively, the set of all solutions that are unions of solutions from To (namely, the set of 
all GCWA*-solutions for S under M) captures what is expressed by M and S in the sense 
as explained in the two motivating examples from the beginning of this section. 

Remark 5.3. The above argumentation can be generalized to more general classes of schema 
mappings. For example, let us consider schema mappings defined by a certain kind of L^^ 
sentences. Let M = {a, r, S) be a schema mapping, where S consists of right-monotonic 
Loouj- st-tgds, which are Loow sentences of the form 

9 := \/x{(p{x) — > iIj{x)), 
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where if is a Looui formula over a, and ^ is a monotonic Loooj formula over r. We assume 
that for each instance S over a, and for each instance T over r, we have SUT \= 9 if and only 
if for all a G (dom(S') U dom.{9)y^', where dom.{9) is the set of all constants that occur in 9, 
S \= (p{a) implies T \= ip{a). This can be enforced, for example, by relativizing the universal 
quantifiers, and the quantifiers in ip to the active domain over a, and by relativizing the 
quantifiers in ip to the active domain over r. Note that right-monotonic Loo(^-st-tgds capture 
st-tgds. 

Now, the above argumentation for schema mappings defined by st-tgds goes through for 
M. The only difference is that, given a source instance S for M, we let 

^M,s '■= {i'iO') I there are \/x{ip{x) — )• V'(^)) in ^ ^-^d a G Consv^ with S \= ^{a)}. 
The remaining part goes through unchanged. 

The following example shows that the GCWA*-answers can be appropriate beyond 
schema mappings defined by right-monotonic Loooj-st-tgds. 

Example 5.4. Recall the schema mapping M, the source instance S for M, and the query q 
from Example l4.9[ For each ground target instance T for M that is the union of minimal solu- 
tions for S under M, there exists a nonempty finite set C C Consi with E"^ = {(a, b) \ b £ C}. 
T is a GCWA*-solution for S under M if and only if 2 < \C\ < 3, as desired. Note that the 
GCWA*-answers to q on M and S are empty, as intuitively expected. 

However, let M = (a, r, S) be a schema mapping, where S does not entirely consist of 
right-monotonic Loooj-st-tgds, and let S be a source instance for M. Then the set of the 
GCWA*-solutions for S under M may suppress information that should intuitively be taken 
into account when answering queries: 

Example 5.5. Consider the schema mapping M = {{P}, {E, F}, {9i, 92}), where 

9i := Vx (P(x) -^ 3z3z' E{z, z')) , 

92 ■■= VxVy (S(x, y) A E{x', y) -^ F{x, x')) , 

and let S be the source instance for M with P^ = {a}. Furthermore, let ci, C2, c be constants 
with ci 7^ C2. Then the target instance 

T := {E{ci,c),E{c2,c)}U{F{ci,Cj) \l<i,j <2} 

for M is a solution for S under M. However, it is not a GCWA*-solution for S under M, 
since every ground minimal solution for S under M has the form {E{d,d'),F{d,d)} for 
d, d' € Const. 

Nevertheless, it seems natural to take into account T when answering queries. Intuitively, 
under an inclusive interpretation of the existential quantifiers in 9i, the st-tgd 9i and the 
atom P{a) in S tell us that it is possible that both E{ci,c) and E{c2, c) hold. In combination 
with 92, this tells us that it is possible that a solution contains E{ci,c), E{c2,c) and F{ci,Cj) 
for i,j E {1,2}. Therefore, T should be a possible solution. 

We now extend the set of GCWA*-solutions so that solutions like T are included as well. 
We do this using the following closure operation. Let T be a nonempty finite set of ground 
minimal solutions for S under M, say T consists of instances Tj = {E{di,ei),F{di,di)} for 
i = 1, . . . ,n. T represents the information that /\^^i E(di,ei) holds, and S and 9i intuitively 
tell us that this is possible. In general, Tq := IJ '^ i^ ^^^ ^ solution for S under M, since 
in general it does not satisfy ^2 (it does if the constants ei, . . . , e^ are distinct). However, 
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we can extend Tq to a solution Tq by adding atoms to To so that 02 is satisfied. We pick 
the minimal set of such atoms, namely {F{di,dj) \ 1 < i,j < n, ei = Cj}, since we do not 
want to add any atoms that are not needed to satisfy 62. Note that Tq is minimal among all 
ground solutions T' for S under M with T' 5 Tq. We add Tq to the set of "valid solutions". 
The set of solutions that results from applying the closure operator contains all solutions 
T for S under M of the form 

Tq = {E{di,ei) I 1 < i < n} U {F{di,dj) \ I < i,j < n, ei = ej}, 

where n > 1 and di, . . . , d„, ei, . . . , e^ are arbitrary constants. Intuitively, this set precisely 
captures what is expressed by M and S. 

In general, we iterate the (appropriately generalized) closure operator until a fixed point 
is reached. We start with the set 

T^f S •= {T I T is a ground minimal solution for S under M} . 

For each set I of instances, let 

(T) := < I It' I T' is a nonempty finite subset of X > , 
where |JT' denotes the union of all instances in I' . For every i > 0, let 

Tlfg '■= Tlf s U {Tq I Tq ^ {T'li s)^ ^^^ there is a Tq E O'li s) such that Tg is minimal 

among all ground solutions T' for S under M with Tq C T'}. 

Intuitively, each instance Tq G Tlfg \ Tlj ^ is a "minimal consequence" of some "fact" Tq G 

{Tf^ g) mentioned by M and S. In Example 1 5. 5 1 the instance T belongs to Tj^/g \ T^ g- 

Note that if S contains only st-tgds, or more generally, right-monotonic Loo^j-st-tgds, 
we have T^i s ~ '^M s ^^^ ^^^ ^ ^ 0, and {T^j g) is precisely the set of all GCWA*-solutions 
for S under M. So, for schema mappings defined by st-tgds or right-monotonic Looa;-st-tgds, 
we have to take into account only the GCWA*-solutions as defined earlier. For more general 
schema mappings, we take into account all solutions for S under M that are unions of one 
or more instances in 

i>0 

Definition 5.6 (GCWA*-solution, GCWA*-answers). Let M = {a,T,'E,) be a schema map- 
ping, let S be a source instance for M, and let g be a query over r. 

(1) A GCWjT -solution for S under M is a ground solution T for S under M that is the 
union of one or more instances in T^j g. 

(2) We call certQcWA*iQ,M,S) := cert(q,I), where I is the set of the GCWA*-solutions for 
S under M, the GCWA' -answers to q on M and S. 

As before, immediately from the definitions, we obtain that the GGWA*-answers are invariant 
under logically equivalent schema mappings: 

Proposition 5.7. If Mi = (a, r, Si) and M2 = (a, r, S2) are logically equivalent schema 
mappings, S is a source instance for Mi and M2, respectively, and q is a query over t, then 
certGCWA*iq,Mi,S) = certGCWA*{q,M2,S). 

Furthermore, it is easy to prove: 
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Proposition 5.8. Let M = (a, r, S) be a schema mapping, where T, consists of st-tgds and 
egds, and let S he a source instance for M . Then a target instance T for M is a GCWJt- 
solution for S under M if and only ifT is the union of one or more ground minimal solutions 
for S under M , and T satisfies all egds in S. 

To conclude this section, we show that, with respect to schema mappings defined by 
st-tgds and egds, GCWA*-solutions can be defined in a way similar to the definition of 
GCWA-solutions. This characterization also shows that, with respect to schema mappings 
defined by st-tgds and egds, GCWA*-solutions are special GCWA-solutions. 

Definition 5.9. For every schema mapping M = (o", r, S) and every source instance S for 
M, define the following set of Lootj sentences over cr U r: 

D\j g := {R{t) — >(/3|i?GcrUr, iG Const^^^ ' , and (/? is a monotonic L^^ sentence 

over cr U r that is satisfied in every minimal model I of Dm,s 

with i e R^}. 

Proposition 5.10. Let M = (cr, r, E) be a schema mapping, where H is a set of st-tgds and 
egds, and let S he a source instance for M . Then for all ground target instances T for M , 
the following statements are equivalent: 

(1) T IS a GCWJt -solution for S under M. 

(2) SVJT IS a model of Dm,s U Dl^g. 

Proof [3 ^ H Suppose that T is a GCWA*-solution for S under M. By Proposition [531 
T is a ground solution for S under M, and there is a set To of minimal solutions for S under 
M such that T = |J To- We have to show that L := S L) T satisfies Dm,s U -CJ/ g- 

Since T is a solution for S under M, we have / |= Dm s- Thus, it remains to show that 

To this end, consider an arbitrary sentence ip := R{t) — )• c/? in D"^ g, and assume that 
/ ^ R{t). Since L = SUT and T = []%, there is some To G T^ with i G i?'^^'^". Note that 
Lq := S L) Tq is a minimal model of Dm,s- By Definition I5.9| we thus have Jo |= V- Since 
Lq Q L and ip is monotonic, it follows that L \= ip. Consequently, / satisfies ■0. 

=^ Ql' Suppose that I := 5" U T is a model of Dm,s U ^m S' Since models are ground 
instances by definition, it follows that T is a ground solution for 5 under M. To show that 
T is a GCWA*-solution for S under M, it remains to construct, by Proposition 15.81 a set To 
of minimal solutions for S under M such that T = IJ To- 

Let To be the set of all minimal solutions To for S under M with To C T. We claim 
that T = IjTo- By construction, we have IjTo ^ T. Thus it remains to show that it is not 
the case that \jTo '^T. 

Suppose, to the contrary, that IJ "To £ T. Then there are R ^ t and t G Consf"^^ ' such 
that 

i£ R^ and i ^ R^"" for all To G Tq. (5.1) 

On the other hand, there is at least one minimal model Jo of Dm,S with t G R^°. Otherwise, 
R(t) —7-^0, which is equivalent to -ii?(i), would be in D\,^ g, so that t ^ R^ R would 
contradict (15. Ih . Let 

Zo := {Lq I Jo is a minimal model of Dm.s with t G i? "}. 
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Then, 

V- := Rit) -^ if with 99 := V /\ i?'(t') 

/oG2:oi?'{t')G-fo 

is satisfied in every minimal model of Dm,s- Since if is monotonic, we thus have ip E D'^j g. 
Furthermore, since / \= D'^j g and / |= R{t), it follows that I \= ip. In particular, there must 
be some Iq € Xq such that / \= A_R'(t')G/ -^'(^Oj ^''^'^ thus, /q C /. Note that Iq = SUTq for 
some To € To- Together with i € i?^" and R £ t, this implies that t G R^°. However, this 
contradicts (|5.ip . Consequently, [j7o = T. □ 

Moreover, the following result translates |351 Theorem 5] from GCWA-solutions to 
GCWA*-solutions, and shows that for a given schema mapping M and a source instance S 
for M, the set Dm,S^D\.j g is maximally consistent in the sense that the addition of any sen- 
tence Ip of the form R{t) — )■ (/?, where (/? is a monotonic Lootj sentence and Dm.s U -D|^ ^ ^ V'i 
leads to a set of formulas that is inconsistent with Dm,s U -DJ^ ^. 

Proposition 5.11. Let M = (u, r, S) be a schema mapping, let S be a nonem,pty source 
instance for M, let D := Dm,s and let D' := D U D* . 

(1) For all monotonic L^^-sentences ip over o" U r, we have D \= p if and only if D' \= cp. 

(2) For all ip := R{t) — > ip, where R{t) is a ground atom over a U t, ip is a Loquj sentence 
over aU T, and D' ^ ip: 

(a) D' U {ip} has no model, or 

(b) there is a monotonic Loouj-sentence x over crUr such that D'u{ip} \= x, but D' ^ x- 

Proof. Statement [U is obvious, so in the following we prove [2] Let ip be given. If D' U {ip} 
has no model, then we are done. So assume that D' U {ip} has a model. Let Iq be the set 
of all minimal models of D' U {ip}, and consider the monotonic -Lootj sentence 

x:= y A ^'(*"')- 

loelo B'(t')elo 
Clearly, we have D' U {ip} \= X- Indeed, if / is a model of D' U {V'}) let Iq G Iq be such 
that Jo ^ I- Then, Iq \= A_R'(t')G/ ^'(^')' ^^'^ therefore, Jo \= X- Since x is monotonic and 
Iq ^ I, this leads to / |= X- 

Furthermore, we have D' ^ x- For a contradiction suppose that D' \= x- Note that 
there is a minimal model Iq of D with Jo H" '^- (This follows immediately from D* C D' and 
D' Y= Ip, which imply ip ^ D* .) Since Jo is a minimal model of D' as well, and D' \= x, we 
have Iq \= x- Thus, there is some Iq G Iq such that Iq \= /\ji'(p)£i' R'it')- In other words, 
Iq ^ Iq, which implies Iq = Iq, because Iq is a minimal model of D' , and Iq \= D' . But this 
is impossible, since Iq \= ip and Iq ^ ip. Consequently, D' ^ x- D 

6. Data Complexity of Query Evaluation under the GCWA*-Semantics 

In this section, we study the data complexity of computing GCWA*- answers, where data 
complexity means that the schema mapping and the query to be evaluated are fixed (i.e., 
not part of the input). We concentrate on schema mappings defined by st-tgds only. 

Since in data exchange, the goal is to answer queries based on some materialized solution, 
given a schema mapping M = (a, r, S) and a query language L, we are particularly interested 
in whether there are algorithms Ai, A2 with the following properties: 
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(1) Ai takes a source instance S for M as input and computes a solution T for S under M, 
and 

(2) A2 takes a solution T computed by Ai and a query q £ L over r as input and computes 
certGcwA*{q,M,S). 

In particular, Ai takes care of the actual data exchange (dependent on the query language, 
but independent of any concrete query) , while A2 answers queries based on some materialized 
solution. At best, both Ai, and A2 for fixed q (z L, run in polynomial time. 

For proving complexity lower bounds, we consider, for fixed schema mappings M = 
{a, T, S) and queries q(x) over r, the decision problem 



EvaLgcwa* (M, q) 

Input: a source instance 5 for M, and a tuple i € Const'^' 

Question: Is i £ certQc^j^ji^*{q,M, S)? 



The complexity of this problem can be seen as a lower bound on the joint complexity of 
finding a solution T as in step [1] above, and obtaining certQQy/js^* (q, M, S) from T as in 
step [2j If, for example, EvaLqcwa* {M, q) is co-NP-complete, then finding T is intractable, 
or computing certQQ^j^x* {q, M, S) from T is intractable. 

We first consider the complexity of computing the GCWA*-answers to monotonic queries 
and existential queries in Sections 16. II and 16.21 and deal with the present section's main result 
concerning universal queries in Section [6.31 



6.1. Monotonic Queries. For monotonic queries, all results obtained for the certain an- 
swers semantics (see, e.g., |10| [32] [3l [26l [29l [HI [5l |6]) carry over to the GCWA*-answers 

semantics: 

Proposition 6.1. Let M = (a, r, S) be a schema mapping, let S be a source instance for M , 
and let q{x) be a monotonic query overr. Then, certGCWA*iQ,M,S) = certowAiQ,M,S). 

Proof. Since every GCWA*-solution for S under M is a solution for S under M, we have 
certowA{q,M,S)_c: certGcWA*iq,M,S). To show certGCWA*(g, M, 5) C certowA{q,M, S), 
consider a tuple t G certQcWA*iq,M, S). We have to show that t € q{T) for all solutions T 
for S under M. To this end, it suffices to show that t G q{T) for all ground solutions T for 
S under M, since nulls can be seen as special constants. Let T be a ground solution for S 
under M, and let Tq be a minimal solution for S under M with Tq C T. By Definition 15. 6| 
To is a GCWA*-solution for S under M, and since i G cericcWA* {q, M, S), we have i £ qiTo). 
Since q is monotonic and Tq C T, we conclude i S q{T). □ 

In particular, if M is a schema mapping defined by st-tgds, and q{x) is a union of 
conjunctive queries over M's target schema, then Proposition 12.21 implies that there is a 
polynomial time algorithm that takes a universal solution for a source instance S for M as 
input and outputs the GCWA*-answers to q{x) on M and S. Note that by Theorem 12. ![ a 
universal solution can be computed in polynomial time (for fixed M) from a given source 
instance for M. 
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6.2. Existential Queries and Beyond. We now turn to existential queries, which are 
FO queries of the form q{x) = 3yip(x,y), where if is quantifier-free. A particular class of 
existential queries are conjunctive queries with negation (CQ^ queries, for short), which are 
queries of the form q{x) = 3y (Li A ■ ■ ■ A L^) where each Li is either a relational atom R{u) 
or the negation of a relational atom. A simple reduction from the CLIQUE problem [14J 
shows that EvaLgcwa* (M, q) can be co-NP-hard for schema mappings M defined by LAV 
tgds and CQ^ queries with only one negated atom: 

Proposition 6.2. There exists a schema mapping M = (o", r, S), where S consists of two 
LAV tgds, and a Boolean CQ^ query q over t with one negated atomic formula such that 
EvaLgc'vk4*(-^;9) is co-NP -complete. 

Proof. Let M = (o", r, S), where a consists of binary relation symbols Eq,Cq, t consists of 
binary relation symbols E, C, A, and S consists of the following st-tgds: 

01 := yxyy{Eo{x,y)^E{x,y)), 

02 := VxVy {Co{x, y) -^ 3zi3z2 (C(x, y) A A{x, zi) A A{y, za))) • 
Furthermore, let 

q := 3x3y3zi3z2{C{x,y)AA{x,zi)AA{y,Z2) A^E{zi,Z2)). 

We show that EvaLqcwa* {M, q) is co-NP-complete by showing that the complement of 
EvaLqcwa* {M, q) is NP-complete. 

Membership: The complement of EvaLqcwa* (M, q) is solved by a nondeterministic Turing 
machine as follows. Given a source instance S for M, the machine needs to decide whether 
certQQ-\j^X* {q, M , S) = 0, that is, whether there is a GCWA*-solution T for S under M such 
that T 1= -ig. 
Note that 

^q = VxVyVziVza (C(x, y) A A{x, zi) A A{y, Z2) -^ E{zi,Z2)) , (6.1) 

and that the minimal ground solutions for S under M are all solutions of the form 

Tf := {E{a,b) \ (a, 6) e E^} U {Cic,c') \ {c,c') G C^} U {^(c,/(c)) \ c € dom(Co^)} , 

for some mapping /: dom(C(f) — )• Const. Hence, if T is a GCWA*-solution for S under M 
with T \= -1(7, there is a minimal ground solution T' QT for S under M with T' \= -ig. In 
particular, it suffices to decide whether there is a minimal ground solution T for S under M 
such that T \= -^q. 

By (j6.ip . if Tf \= -iq for some /: dom(Cg ) -^ Const, then for all c G dom(Cg ) we have 
/(c) G dom(£'Q ). Hence, in order to check whether there is a minimal ground solution T for 
S under M such that T \= -^q, it suffices to guess a mapping /: dom(C^) — > dom(5), and 
to check whether Tj \= -^q. Clearly, this can be done by a nondeterministic Turing machine 
in time polynomial in the size of S. 

Hardness: To show that the complement of EvaLqcwa* {^1 q) is NP-hard, we present a 
reduction from the NP-complete CLIQUE problem |14j . The CLIQUE problem is to decide, 
given an undirected graph G = {V, E) without loops and a positive integer k, whether G 
contains a clique of size k. Here, a clique in G is a set C QV such that for all u,v ^ C with 
u ^ V we have {u, v} € E. 

Let G = {V, E) be an undirected graph without loops, and let /c > 1 be an integer. If 
k = 1, then G has a clique of size k if and only if V is nonempty, and we can reduce [G, k) 
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to some predefined fixed source instance S for M such tliat cer^GCWA* (q, M, S") = if and 
only if V is nonempty (i.e., a source instance S with Cq = if y is nonempty, and a source 
instance S with Cq = {(c, c)} for some c G Const if V^ is empty). 

If /c > 2, we reduce {G, k) to the source instance S for M with Eq = E and C(^ = 
{{ci^Cj) \ 1 < i,j < /c, i 7^ j}, where ci, . . . , c^ are pairwise distinct constants that do not 
occur in V . We claim that G has a clique of size k if and only if certQQ-\j^x* {q, M ^ S) = 0. 

"Only if" direction: Let C = {wi, . . . , Ufc} be a clique of size k in G, and let T be the target 
instance for M with E'^ = E, G^ = C^ and A^ = {{ci.Vi) \l<i<k}. Then T is a 
minimal solution for S under M, and, by Definition 15.61 a GCWA*-solution for S under 
M. Furthermore, we have T ^ g. To see this, note that for all u,v,wi,W2 G dom(T) with 
T \= C {u, v) A A{u, wi) AA{v,W2), there are distinct i,j £ {1, . . . ,k} with u = Ci and v = Cj, 
so that wi = Vi and tt;2 = Vj. Since fi,fj € G and ii^-^ = ii^, we thus have T \= E{wi,W2) 
for all such u,v,wi,W2- Since T is a GCWA*-solution for S under M, and T ^ g, we have 
cericcwA* (g, M, S") = 0. 

"//" direction: Suppose that certQQ-\j^x*{q,M,S) = 0. Then there is a GCWA*-solution T for 
S under M with T ^ g. For all i S {1, . . . , /c}, let 

Fj := [v G dom(r) | (ci,f) G A'^} . 

Since S* U T |= 92-, each 14 is nonempty. Thus, there is a set G = {vi, . . . ,Vk} such that 
Vi ^ Vi for each i G {!,..., A:}. Moreover, for all i,j G {!,..., A;} with i ^ j, we have 
{vi,Vj) G E'. To see this, observe that T \= G{ci,Cj) A A{ci,Vi) A ^(cj,fj), so that T ^ q 
implies T \= E{vi,Vj). It follows that C is a clique in G of size k (since k > 2 and G has no 

loops). n 

Adding only one universal quantifier can make the problem undecidable. Specifically, 
let us consider 3*V FO queries, which are FO queries of the form 3xi • ■ • 3x/fcVy if, where if 
is quantifier-free. Then we have: 

Proposition 6.3. There exists a schema mapping M = (a, r, S), where S consists of two 
LAV tgds, and a Boolean 3*V FO query q over t such that EVAL(5c'pi/yi*(M, g) is undecidable. 

Proof. Let M = ({R},{Rp,Rf},Ti), where R,Rp,Rf are ternary relation symbols, and S 
consists of the st-tgds 9c ■= yx{R{x) -^ Rp{x)) and 9ci '■= Vx(i?(x) -^ 3y Rf{y)). Let g be a 
FO query that is true in a target instance T for M precisely if RJ^ C i?T, and i?T encodes 
the graph of a total associative function f : B x B ^ B for some set B: 

q := yx {Rp{x) ^ Rf{x)) 

A VxVyiVy2(-R/(x,yi) ARf{x,y2) ^ yi = ^2) 

A VxVy ((/7dom(a:) A v?dom(y) ^ 3z i?/(x, y, z)) 

A ^x'iyMziuMv'iw [Rf{x, y, u) A Rf{u, z, v) A Rf{y, z, vu) — > Rf{x, w, v)) , 

where, in a target instance T for M, 

'/Odom(a;) := 3xi3x23x3 LR/(rci,X2,X3) A \yx = Xj 

defines the set of all values that occur in RT . Note that the last three lines in the definition 
of q are essentially the target constraints of the schema mapping in | i27^ Theorem 3.6]. Note 
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also that the negation of q is equivalent to a Boolean 3*V FO query. Let q be this 3*V FO 
query. 

To show that EvaLqcwa* (-^j q) is undecidable, we reduce the 



Embedding problem for finite semigroups 








Input: 


a partial function p: A^ ^ A, where A is a finite set 








Question: 


Is there a finite set B ^ A and a total function /: B^ 


—^ B such that 


.fis 




associative, and / extends p (i.e., p{x,y) defined implies f{x,y) 


= p{x, 


2/))? 



that is known to be undecidable [27] , to EvaLqcwa* (-^j q) ■ Let p : A^ — )• A be a partial 
function, where A is a finite set. Construct the source instance S for M, where R is the 
graph of p, that is, R^ = {{a,b,c) \ p{a,b) = c}. We claim that certQCYfp^*[q,M,S) = if 
and only if p is a "yes"-instance of the embedding problem for finite semigroups. 

Note that the GCWA*-solutions for S under M are all the target instances T for M such 
that R!E = R^ , and either (1) R''^ = R!E = i?T = 0, or (2) i?T is a nonempty finite subset 
of Consr . Therefore, certQQy^x*{q,M,S) = if and only if there is a GCWA*-solution T 
for S under M such that i?T is the graph of a total function /: dom(T)^ — t- dom(T) that is 
associative and extends p. This is the case precisely if p is a "yes"-instance of the embedding 
problem for finite semigroups. □ 

6.3. Universal Queries. As we have seen in Section 16.21 computing GCWA*-answers to 
existential queries may be a difficult task, and even more difficult (if possible at all) if the 
query additionally contains universal quantifiers. 

We now turn to universal queries, which are FO queries of the form q[x) = \/yip{x,y), 
where ip is quantifier-free. As a general upper bound for such queries with respect to schema 
mappings defined by st-tgds we obtain: 

Proposition 6.4. Let M = {a, r, S) be a schema mapping, where T, consists of st-tgds, and 
let q[x) he a universal query over r. Then, EVALccW/4*(-^) 9) is i^ co-NP. 

The proof of Proposition 16. 41 uses basic ideas from the proof of this section's main result. 
Theorem 16.61 and is deferred to Section 16.3.41 

In what follows, we prove that for schema mappings defined by st-tgds which are packed 
as defined below, the GCWA*-answers to universal queries can even be computed in polyno- 
mial time. 

Definition 6.5 (packed st-tgd). An st-tgd \/x\/y{ip{x,y) — > 3z^l^{x,z)) is packed if for all 
distinct atoms Ri{ui),R2{u2) in "0, there is a variable in z that occurs both in ui and in U2- 

Notice that the schema mapping defined in the proof of Proposition 16.31 is defined by 
packed st-tgds. 

Although schema mappings defined by packed st-tgds are not as expressive as schema 
mappings defined by st-tgds, they seem to form an interesting class of schema mappings. 
Packed st-tgds still allow for non-trivial use of existential quantifiers in the heads of st-tgds. 
For example, consider a schema mapping M defined by st-tgds \/x\/y{ip{x,y) — >■ 3zip{x,z)), 
where tp contains at most two atoms that contain variables from z. Then M is logically 
equivalent to a schema mapping defined by packed st-tgds. To see this, let 

6 := VxVy(iy9(x,y) -^ 3z'iIj{x,z)) 
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be an st-tgd in M, and let G be the graph whose vertices are the atoms in ip, and which 
has an edge between two distinct atoms if they share a variable from z. Let Ci, . . . , C^ be 
the connected components of G, and for every i £ {1, . . . , A;} let 

9i := VxVy {^{x,y) -^ Bz'f/'j), 

where ipi is the conjunction of all atoms in Cj. Then 6 is logically equivalent to {^i, . . . , Ok}- 
Using that ip contains at most two atoms with variables from z, it is easy to see that each Oi 
is a packed st-tgd. As a special case, it follows that each full st-tgd is equivalent to a set of 
packed st-tgds. An example of a st-tgd that is not packed is \/x{P{x) -^ 3zi3z2^Z2,{E{x, zi)/\ 

E{ZI,Z2)AE{Z2,Z3))). 

We are now ready to state this section's main result: 

Theorem 6.6. Let M = (cj, r, S) be a schema mapping, where T, consists of packed st-tgds, 
and let q{x) be a universal query over t. Then there is a polynomial time algorithm that, 
given Core(M, S") for some source instance S for M, outputs certQciyj^* (q, M , S) . 

Note that Theorem 16.61 and Theorem 12.11 immediately imply that for every schema 
mapping M specified by packed st-tgds, and for every universal query q{x) over M's target 
schema, there is a polynomial time algorithm that takes a source instance S for M as input, 
and outputs certQQ^J\/J^*{q,M, S). In particular: 

Corollary 6.7. If M is a schema mapping defined by packed st-tgds, and q{x) is a universal 
query over M's target schema, then 'Ej\A'LQcmA'{M,q) is in PTIME. 

An interesting consequence of Theorem l6.6l is the following. Let M be a schema mapping 
defined by packed st-tgds, and let 5" be a source instance for M. Recall from Section [2] 
that the OWA-answers to unions of conjunctive queries on M and S can be computed in 
polynomial time from Core(M, 5) (assuming M and the query are fixed). In other words, 
we only need to compute Core(M, S) in order to answer both unions of conjunctive queries, 
and universal queries. As mentioned above, Core(M, S) can be computed in polynomial 
time if M is fixed. 

Let us now turn to the proof of Theorem 16.61 Observe that Theorem 16. 6l is an immediate 
consequence of: 

Theorem 6.8. Let M = (a, r, S) be a schema mapping, where E consists of packed st- 
tgds, and let q{x) be a universal query over r. Then there is a polynomial time algorithm 
that, given Core(M, S) for some source instance S for M, and a tuple i G Const'^', decides 
whether t G certccwA* {q, M-, S). 



The remaining part of this section is devoted to the proof of Theorem 16.81 

6.3.1. GCWyf -Answers and the Core. Let us first see how we can decide membership of 
tuples in certQcWA* il, M, S) using Core(M, S). Consider a schema mapping M = (a, r, S), 
where S is a set of packed st-tgds, and let g(x) be a universal query over r. Given Core(M, S) 
and an |a;|-tuple t of constants, how can we decide whether t € certQcwA* (?) M, S)? 

First observe that if t is not a tuple over const ( Core ( M, S*)) U dom(g), then by the 
definition of certQcwiji^* (q, M, S) we have t ^ certQCWA* {q, M, S). Therefore, in the following 
we assume that f is a tuple over const(Core(M, 5)) U dom(g). In this case, we have i ^ 
certocwA* (?) M, S) if and only if there is a GCWA*-solution T for S under M such that 
T \= ^q{t}. By the definition of GCWA*-solution and the fact that E consists of st-tgds, the 
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latter is the case precisely if there is a nonempty finite set T of ground minimal solutions 
for S under M with IJ'^ N ~'9(^- Using the following lemma, we can reformulate the last 
condition in terms of Core(M, S). 

Recall the definition of a valuation of an instance T, and the definition of poss(T) from 
Section El Then: 

Lemma 6.9. Let M = (a, r, S) be a schema mapping, where S consists of st-tgds, and let 
S he a source instance for M . Then the set of all ground minimal solutions for S under M 
is precisely the set of all minimal instances in poss{Coie{M , S)) . 

Proof. Let T := Core(M, 5"). We first show that every instance in poss(T) is a ground 
solution for S under M. Let T be an instance in poss{T). Then there is a valuation v of T 
with v{T) = T. This shows that T is ground. To see that T satisfies all st-tgds in E, let 
9 '.= yx\/y{ip{x,y) — > Bziplxjz)) be a st-tgd in S, and let a,b be tuples with S \= (p{a,b). 
Since S L)T \= 9, there is a tuple i with T \= ■ilj{a,t), and thus T \= i/;(a,f(t)). Altogether, 
T is a ground solution for S under M. 

It remains to show that every ground minimal solution for S under M is in poss{T). 
Let To be a ground minimal solution for S under M. It is not hard to verify that there is a 
valuation vq of T* := CanSol(M, S) with vo{T*) = Tq (see also |29j). Since T* is a universal 
solution for 5 under M, we have T = Core(T*), and thus l(T) C T* for some injective 
mapping t: dom(r) -^ dom(T) that is legal for T. Let v := vq o l. Then, 

v{T) = vo{i{T)) C vo{T*) = To. (6.2) 

Note that v{T) £ poss{T). Therefore, as shown above, v{T) is a solution for S under M. 
Since Tq is a minimal solution for S under M, (|6.2p implies v{T) = Tq. Thus, t; is a valuation 
of T with v{T) = Tq, which proves that Tq € poss(T). □ 

Given Core(M, S) and t, it remains to decide whether there is a nonempty finite set 
T of minimal instances in poss{Core{M, S)) such that IJ '^ N ~'9(^- Note that, since q is 
a universal query, -ig is logically equivalent to a query of the form 3y (p{x,y). Before we 
consider the general case (where ip is an arbitrary quantifier- free query) in Section 16.3.31 the 
following section deals with the case that y contains no variable and ip consists of a single 
atom R{u), where ti is a tuple of constants. In this case, the problem simplifies to: Is there 
a minimal instance in poss{CoTe{M,S)) that contains R{u)l 

6.3.2. Finding Atoms in Minimal Instances. Let M = (a, r, S) be a schema mapping, where 
S consists of packed st-tgds, let T := Core(M, S) for some source instance S for M, and let 
R(t) be an atom over r. In the following, we consider the problem of testing whether there 
is a minimal instance Tq in poss{T) with -R(t) € Tq. We will often state results in a more 
general form than necessary, so that we can apply those results later in the more general 
setting considered in Section [6.3.31 

First note that there may be infinitely many minimal instances in poss(T), so that it is 
impossible to check out all these instances. However, it suffices to consider representatives 
of the minimal instances in poss(T), where constants that do not occur in T or R{t) are 
represented by nulls in T. Denoting by C the set of all constants in i, the set minc{T) of 
all such representatives is formally defined as follows: 

Definition 6.10 {valc{T), minc(T)). Let T be an instance, and let C C Const. 
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(1) We write valc{T) for the set of all mappings /: doni(T) -^ dom(T) U C that are legal 
for T. 

(2) Let minc{T) be the set of all instances T for which there is some / S valc{T) with 
f = f{T), and there is no /' G valc{T) with /'(T) C f. 

Throughout this section, C will usually be the set of constants in t. 

Proposition 6.11. Let T be an instance, and let C C Const. 

(1) For each Tq G poss{T), the following are equivalent: 

(a) Tq is a minimal instance in poss(T). 

(h) There is an instance Tq G minc{T) and an injective valuation v of Tq such that 
v{Tq) = Tq, and v~^{c) = c for all c G dom(To) PI C. 

(2) If T is a core, then T G minc{T). 

(3) Each instance in minc{T) is a core. 

Proof. Ad [IJ- We first prove that [la] implies llbl Suppose that Tq is a minimal instance in 
poss{T), and let vq be a valuation of T with vq{T) = Tq. Furthermore, let v: dom(ro) — > 
dom(T) U C be an injective mapping with 

v{c) = c for each c G dom(To) fl (const(T) U C), (6-3) 

and 

v{c) G nulls(r) for each c G dom(ro) \ (const(r) U C). (6.4) 

Then / := 'D o uq G valc{T), and 

T;,:=fiT) = vivQ{T)) = viTo). (6.5) 

Let V be the inverse of v on dom(rQ). Then, by ()6.3p -( f63]l . v is an injective valuation of Tq 
such that v{Tq) = Tq, and v~^{c) = v{c) = c for every c G dom(To) n C. 

It remains to show that Tq G minc{T). By (j6.5p . we have Tq = f{T), where / G valc{T). 
Suppose, for a contradiction, that there is an /' G valc{T) with f'{T) C Tq. Since v is 
injective and ^'(Tq) = Tq, we then have v{f'{T)) C v{Tq) = Tq, which is impossible, since 
v{f'{T)) G poss(T), and Tq is a minimal instance in poss(T). 

We next prove that [lb] implies [Tal Suppose that Tq G minc{T) and that v is an injective 
valuation of Tq with ^(Tq) = Tq (we will not need the restriction that v~^{c) = c for all 
c G dom(To) n C). We show that Tq is a minimal instance in poss{T). To this end, let 
/ G valc{T) be such that /(T) = Tq. Then vq := v o f \s a, valuation of T, so that 

To = v{Tl,) = v{f{T)) = vo{T) G poss{T). 

It remains, therefore, to show that there is no Tq G poss{T) with Tq C Tq. 

Suppose, to the contrary, that there is such a Tq. Let vq be a valuation of T with 
vq{T) = Tq, and let / := v~^ o vq, where u~^ is the inverse of v on dom(To). Since v~'^ is 
an injective mapping on dom(To), we have /(T) = v~'^{vq{T)) = v~^{Tq) C v~^{Tq) = Tq, 
which is impossible, since / G valc{T) and Tq G minc{T). 

Ad\^ Clearly, the identity / on dom(T) belongs to valc{T) and satisfies /(T) = T. Let 
/' G valc{T) be such that f'{T) C T. Then /' is a homomorphism from T to T, and since 
T is a core, we cannot have /'(T) C T. 

y4(i[3' Let / G valc{T) be such that Tq := /(T) G minc{T). For a contradiction, suppose 
that To is not a core. Let h he a. homomorphism from To to Tq such that /i(To) is a 
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core of Tq. Since Tq is not a core, we have h{To) C Tq. Thus, for f := h o /, we have 
f'{T) = h{f(T)) = /i(ro) C Tq, which contradicts Tq € minc{T). Hence, Tq is a core. □ 

The converse of Proposition I6.11l |3|) is not true, as shown by the fohowing example: 

Example 6.12. Let T be an instance over a = {E,P}^ where E = {(a,_L), (_L,_L')} and 
P = {6}. The mapping / € valc{T) with /(-L) = a and /(J-') = b then yields the instance 
/(T), where ^^C^) = {{a, a), (a, 6)} and P^(^) = {b}. Hence, /(T) is a core. However, /(T) 
does not belong to minc{T), since the mapping /' G valc{T) with /'(-L) = /'(J-') = a yields 
the instance f'{T) with i?-^ *^-^-' = {(a, a)} and P-^ (-^^ = {6}, which is a proper subinstance 

of/(r). 

Note that the size of minc{T) can be exponential in the size of T, so that it is not 
possible to enumerate all instances in minc{T) in polynomial time, given T and Rit) as 
input. To tackle this problem, we take advantage of a nice structural property of T that can 
be described in terms of atom blocks: 

Definition 6.13 (atom block |16|). Let T be an instance. 

• The Gaifman graph of the atoms ofT is the undirected graph whose vertices are the atoms 
of T, and which has an edge between two atoms A,A'^T if and only \i A ^ A', and 
there is a null that occurs both in A and A' . 

• An atom block of T is the set of atoms in a connected component of the Gaifman graph 
of the atoms of T. 

Note that each atom block of T is a subinstance of T. Furthermore, for each atom block B 
of T that contains at least one null, nulls(-B) is a block as considered in [12j. The crucial 
property of T is: 

Lemma 6.14 (|12j). For every schema mapping M = (a, r, S), where S consists of st-tgds, 
there is a positive integer bs such that if S is a source instance for M , and B is an atom 
block o/Core(M,5), then |nulls(5)| < bs. 

Let us come back to our initial problem - to decide whether there is a minimal instance 
in poss{Core{M , S)) that contains the ground atom R{t). Let T := Core(M, S), and let C 
be the set of constants in t. By Proposition I6.11I |T|) it is enough to decide whether there is 
a To G minc{T) with R(t) G Tq. The following algorithm seems to accomplish this task: 

(1) Compute the atom blocks of T. 

(2) Consider the atom blocks i? of T in turn, and 

(3) if there is an instance Bq G minc{B) with R{t) G Bq, accept the input; 
otherwise reject it. 

Since, by Lemma 16.141 there is a constant bs with |nulls(-B)| < bs for each atom block B of 
T, we have to consider at most \valc{B)\ = \doTa{B) U C\ mappings in step [3] to find all 
the instances Bq G minc{B). Thus, the whole algorithm runs in polynomial time. 

Example 16.151 below shows that this algorithm is incorrect. In particular, the example 
exhibits an instance T that is a core, and an atom block B of T such that there is an atom 
A of some minimal instance Bq G poss{B) that is not an atom of any minimal instance in 
poss{T). Letting C be the set of all constants in A, this implies that there is an atom of 
some instance Bq G minc{B) that is not an atom of any instance in minc{T). 

Example 6.15. Let T be the instance over {E} with 

E'^ = {{a, b), (a, ±), (6, ±), (6, ±'), (6, ±"), (±', ±")}, 
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and consider the atom block 

B = {E{b,±'),E{b,±"),E{±',±")} 
of T; see Figure [1] for a graph representation of T and B. Note that T is a core. It is not 



Figure 1: The instance T. The subinstance induced by the gray vertices is B. 

hard to see that every minimal instance in poss(B) has one of the following forms: 

(1) {Eib,b)}, 

(2) {E{b,c),E{c,c)} withe e Const\{b}, or 

(3) {E{b,c),E{b,c'),E{c,c')} with c,c' e Const\ {b} and c ^^ c' . 

Thus, there is a minimal instance in poss{B) of the third form that contains E{c, a) for some 
c € Const\ {b} (replace c' in [3] with o). 

However, there is no minimal instance in poss(T) that contains E(c, a): Such an instance 
must be obtained from T by a valuation v of T with v{l.') = c and v{l.") = a, since E{1.' , _L") 
is the only atom in T that could be the preimage of E{c, a) - all other atoms either have aor b 
as their first value. However, let f be a valuation of T with w(-L') = c and v{l.") = a, and let 
/: dom(r) -^ dom(r) be such that /(a) = a, f{b) = b, f{±') = a and f{±) = f{±") = ±. 
Then, for v' := v o /, we have 

v'{T) = {E{a,b),E{b,a),E{a,v{±)),E{b,v{±))} 

C {E{a, b), E{b, a),E{a, v{±)),E{b, v{±)), E{b, c), E{c, a) } = v{T). 

Thus, v{T) is not minimal in poss(T). 

It is nevertheless possible to solve our initial problem using the following approach. Let 
T = Core(M, S), let R{t} be a ground atom, and let C be the set of all constants in t. Our 
goal is to decide whether there is an instance Tq G minc{T) with R{t) S Tq. To this end, 
we identify a set <S C minc(T) of size polynomial in the size of T such that -R(t) occurs 
in an instance in minc{T) if and only if R{t) occurs in an instance in S. Furthermore, we 
ensure that S can be computed in polynomial time from T and C. To define S, we need a 
few definitions. 

In the following, we fix, for each instance /, a core Core(/) C /, namely the output of 
the algorithm provided by the following lemma: 

Lemma 6.16 (implicit in [12]). There is an algorithm that takes an instance I as input, and 
outputs a core J C I of I in time 0{n''~^^), where n is the size of I and b is the maximum 
number of nulls in an atom block of I . 

Proof. Just omit the first step of the blocks algorithm from [I2j . That is, given an instance 
/, proceed as follows: 

(1) Compute a list Bi, . . . , Bm of all atom blocks of /, and initialize J to be /. 

(2) Check whether there is a homomorphism h from J to J such that h is not injective, and 
there is some i G {1, . . . , m} such that h{u) = u for each u G dom( J) \ nulls(i?j). 
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(3) If such a h exists, replace J by h{J), and go to stepO 

(4) Output J. 

Now the lemma follows from the proof of [12, Theorem 5.9]. □ 

Given T = Core(M, S) and C as above, we define the set S to be the union of the 
following sets minc(T, B) over all atom blocks B of T. 

Definition 6.17 {minvalc{T,B), minc(T,B)). Let T be an instance, let B be an atom 
block of T, let B ■.= T\B, and let C C Const. 

(1) Let valc{T,B) be the set of all mappings / € valc{T) such that 

• /(_L) = _L for aU ± E nulls(^), and 

• all nulls that occur in f{B) \ B belong to nulls(i?). 

(2) Let minvalc{T, B) be the set of all / G valc{T, B) such that there is no /' G valciT, B) 
with f'{T) C f{T). 

(3) Let mi7ic{T,B) := {Core(/(r)) | / G minvalc{T,B)}. 

Using Lemma |6.16[ we obtain: 

Proposition 6.18. For each positive integer hs, there is a polynomial time algorithm that, 
given an instance T such that the number of nulls in each atom block of T is at most bs, 
and a set C C Dom, outputs a list of all instances that occur in minc{T, B) for some atom 
block B ofT. 

Proof. The algorithm is as follows: Given T and C, first compute a list /i, . . . , /„, of all 
mappings / such that there is an atom block B oiT with / G minvalc{T, B). This can be 
done in time polynomial in the size of T. Then, compute and output Core(/j(T)) for each 
i G {1, . . . ,m,}. By Lemma l6.16| this can be done in time 0{n ^^^), where n is the size of 
T (note that the number of nulls in each atom block of /j (T) is bounded by bs) . □ 

The following Lemma 16.191 tells us that the instances in minc{T,B) indeed belong to 
minc{T). Before stating the lemma, let us introduce retractions. Given an instance /, a 
retraction of I is a homomorphism h from I to I such that h(u) = u for all elements u in 
the range of h. In particular, for all atoms A G h{I), we have A ^ I and h{A) = A. It is 
known that a core of / is an instance J for which there is a retraction h of I with h{I) = J, 
and there is no retraction of J to a proper subinstance of J (|19|). A retraction of I over a 
set X C Dom is a retraction h of I such that h(u) = u for each u (^ X D dom(I). 

Lemma 6.19. Let T be an instance, let B be an atom block of T , let B :=T\B, and let 
C C Const. Then for each f G minvalc{T,B), there is a retraction h ofT := /(T) over the 
set of the nulls of f{B) \ B such that 

(1) h{T) is a core ofT, and 

(2) h{f) G minc{T). 

In particular, minc(T,B) C minc{T). 

Proof. Let A := f{B) \ B, and let /i be a retraction of T over nulls(>l) such that for 

To := hit) = h{f{T)) (6.6) 

we have: 

There is no retraction h' of Tq over nulls (^) with h'CTo) C Tq. (6-7) 
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We show that Tq is a core of T, and that Tq G mine (T) ■ 

Step 1: Tq is a core ofT. 

Suppose, for a contradiction, that Tq is not a core of T. Then there is a retraction h' of Tq 
with h'{TQ) C Tq. By (I6.7p . there is some _L E nuns(^) with /i'(-L) 7^ ^. Let ^4 be an atom 
in A that contains ±. Since /i'(-L) 7^ J- and h' is a retraction, _L does not occur in the range 
of h' , and therefore A does not occur in h'[A). Together with h'{A) ^ A\J B and A £ A, 
this imphes 

h'{A)\B C A. (6.8) 

Consider the mapping /': dom(T) — )• dom(T) U C defined for each u G dom(r) by 

j/(^).= /^'W(^))), ifnGnuns(i3), 
I u, otherwise. 

Then /' G valc{T,B), since / G valc{T,B) and /i, /i' are retractions. Moreover, 

/'(S) C /i'(To) = /I'Cfb \ B) U /i'(S) C /i'(fo \B)UB, (6.9) 

where the first inclusion holds due to f'{B) = h'{h{f(B))) C h'{h(f{T))) and (|6.6|) . and the 
last inclusion holds, because h' is a retraction of Tq. 
Observe also that 

TqXBCA. (6.10) 

Indeed, let A be an atom of Tq with A ^ B. By (16. 6p . we have Tq = h{f{T)). Together with 
/(T) = f{B) U f{B), and the fact that /i is a retraction of f(T), this implies that ^ G f{B) 
or Ae f(B). Note that A ^ f(B), because f(B) = B&ndA^B. Hence, A G /(S), which, 
together with A ^ B, implies that A ^ A. 
Consequently, we have 

f'{T) = f\B) U B since T = 5 U B and /' G valc{T, B) 

C h'{A) UB by do]) and (l6J0]l 

C ^ U ;B by dSSD 

= f{B)U f(B) by definition of ^, and / G valdT, B) 

= f{T), 

which contradicts / G minvalciT, B). Thus, Tq is a core of T. 

Step 2: Tq G minc{T). 

First observe that Tq = /o(T'), where Jq := ho f and /o G valc{T). It remains, therefore, to 

show that there is no /' G valc{T) with f'{T) C Tq. 

Suppose, for a contradiction, that there is such a mapping /'. Without loss of generality, 
we may assume that f'{T) G minc{T). Moreover, 

f'{T) C To ^ h{f{T)) C /(T) = f{B) U B. (6.11) 

Next observe that 

r{B)\B = f{B)\B. (6.12) 
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Otherwise, we could use /' to construct a mapping /" G valc{T,B) such that f"{T) C f{T), 
which is impossible, because / G minvalc{T,B). Indeed, assume that (|6.12p is not true. 
Then, 

f'iB)\BCf{B)\B, (6.13) 

since by (f6lT]l we have f'{B) C f{B)UB. Let /": dom(T) -^ dom(r) U C be such that 
for each u G dom(r), f"{u) = f'{u) if u G nulls(i?), and f"{u) = u otherwise. Then it is 
not hard to see that /" G valc{T,B). Moreover, we have 

_ _ l [6A3l _ _ 

f"{T) = {f'iB)\B)UB C if{B)\B)UB = fiT), 

as claimed. 

Now, (I6.12P implies that the mapping h' : dom(r) — )• dom(T)UC that is defined for each 
u G dom(T) by h'{u) = n if u G nulls(-B), and h'{u) = f'{u) otherwise, is a homomorphism 
from /(T) = f{B)UBto f'{T). Furthermore, by (1611]) we have /'(T) Q f{T), and therefore, 
the identity on dom(/'(T)) is a homomorphism from f'{T) to f{T). This implies that f'{T) 
and /(T) are homomorphically equivalent, and therefore, their cores are isomorphic. Since 
To is a core of f{T) as shown in Step 1, and f'{T) is a core by Proposition 16. Illf3] l. we have 
To = f'{T). However, this is a contradiction to our earlier assumption that f'{T) Cj Tq. □ 

Clearly, the union of the sets minc{T,B) over all atom blocks B of T does not cover 
the whole set of instances in minc{T). However, Lemma 16.231 below tells us that for each 
atom A of some instance Tq G minc{T) there is an atom block B oi T and an instance 
Tb G minc{T,B) that contains an atom A' isomorphic to A in the following sense: 

Notation 6.20. We say that two atoms Ai,A2 are isomorphic, and we write Ai = A2, if 
the instances {Ai} and {^2} are isomorphic. 

Note that R{ui, . . . ,Ur) and R'{u'i, . . . ,u'^,) are isomorphic if and only li R = R' , r = r', 
and for all i,j G {1, . . . , r}, Ui G Const if and only if u'^ G Const, Ui G Const implies Ui = u'^, 
and Ui = Uj if and only if u'- = u'- . 

Lemma 16.231 is based on the following notion of a packed atom block: 

Definition 6.21 (packed atom block). An atom block B of an instance is called packed if 
for all atoms A,A'£B with A ^ A', there is a null that occurs both in A and A'. 

Immediately from the definitions, we obtain: 

Proposition 6.22. If M = (a, r, S) is a schema mapping, where S consists of packed 
st-tgds, and S is a source instance for M , then each atom block of Core(M, S) is packed. 

We are now ready to state the main result of the present Section 16.3.21 

Lemma 6.23. Let T he an instance such that T is a core, and each atom block of T is 
packed. Let C C Const, and let Tq G minc{T). Then for each atom A G Tq, there are 

(1) an atom block B ofT, 

(2) an instance Tb G minc{T,B), 

(3) an atom A' G Tg with A' = A, and 

(4) a homomorphism h from Tb to Tq with h{TB) = Tq and h{A') = A. 
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Figure 2: The mappings f,fi,gi,hi and their relations. 

Proof. Let T be an instance such that T is a core and each atom block of T is packed. Let 
C C Const, and let Tq G minc{T). Furthermore, let i?i, . . . , Bn be an enumeration of all 
the atom blocks of T. We prove the following stronger statement: 

Let i E {1, . . . , n}. Then there is an instance Tj G minc{T, Bi) and a homo- 
morphism hi from Tj to Tq with hi{Ti) = Tq such that the following is true: , , 
For each atom A G Tq, there is an index j G {1, . . . , n} and an atom A' ^ Tj ^ ^ 
with hj{A') = A and A' ^ A. 



Idea of the construction. 

We start with an / G valc{T) such that f{T) = Tq. The first step is to find mappings 
/i G valc{T,Bi), . . . , fn G valc{T,Bn) such that each fi{Bi) is isomorphic to f{Bi). This 
is easy since we can assign an "unused" null in Bi to any null in Bi that is mapped by / to 
a null outside -Bj. 

We then "minimize" each fi{T) by picking a. gi ^ minvalc{T,Bi) with gi{T) C fi{T). 
The instance Tj is then defined to be the core of gi{T) (that contains gi{Bi) \{T \ Bi)). It 
is then not hard to define a homomorphism hi from Ti to Tq with hi{Ti) = Tq; see Figure [2] 
for an illustration. 

Finally, we have to show that for each atom A ^ Tq, there is a j G {1, . . . ,n} and an 
atom A' G Tj with hj{A') = A and A' = A. This is not an immediate consequence of the 
construction of Tj and gi. To explain the problem, let us pick an atom yl G Tq. This atom 
must occur in some f{Bj), possibly in more than one such f{Bj). Let j G {!,... ,n} be 
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such that A G f{Bj). It wih be clear from the construction of the fj and hj that there is an 
atom A' G fj{Bj) with A' = A. If A' G gj{Bj) and A' ^T\Bj, we are done: in this case 
we will have A' G Tj and hj{A') = A. However, if A' ^ gj{Bj) or ^' G T \ Bj, it may be 
that Tj contains no atom isomorphic to A, or - in the case that T contains such an atom, 
say A" - that hj does not map A" to A. 

The extreme case is that for all j G {l,...,n} with A G f{Bj), there is no atom 
A' G gj{Bj) with A' ^ T\ Bj and A' = A. Using the property that all of the atom 
blocks Bi, . . . ,Bn are packed, we show that this case does not occur, whereby proving Q- 
(Example 16.251 below shows that it may occur if the atom blocks are not packed.) 

It is helpful here to think in terms of the following graph G: the nodes of G are the 
atoms of T, and there is an edge from a node A' G Bj to a node A" if gj{A') G T \Bj 
and gj{A') = A" . The core of the proof can then be summarized as follows: We first show 
that any path in G must eventually reach a node A' G Bj for some j G {1, . . . , n} such that 
gj{A') ^ T \ Bj. Otherwise, there would be a cycle in G containing a node A' . This would 
imply that A" := gj{A') is isomorphic to A' , and that A" ^ Bj. But since Bj is packed, this 
implies that gj is actually a homomorphism from Bj to T \ Bj, which is impossible since 
T is a core. Using this property, we then construct - basically by repeated application of 
the mappings gi, . . . ,gn followed by a renaming of the nulls - a mapping /' G valc{T) such 
that T' := /'(T) C Tq. If there are no j G {1, . . . ,n} and A' G Tj with hj{A') = A and 
A' = A,we will have A ^ T' , which would mean that T' C Tq and contradict Tq G minc(T). 
Consequently, there is a j G {1, . . . ,n} and an atom A' G Tj with hj(A') = A and A' = A, 
which proves (j*j). 

The details. 

Let / G valc{T) be such that f{T) = Tq. The construction of the instances Tj and 
the homomorphisms hi proceeds in three steps. First, we "split" / into mappings /i G 
valc{T,Bi), . . . , fn G valc{T,Bn) such that each /i(-Bj) is isomorphic to f{Bi). Second, 
we use these mappings to construct the instances Tj and the homomorphisms /ij. Third, 
we show that for each atom A G Tq, there is a j G {1, . . . ,n} and an atom A' G Tj with 
hj{A') = A and A' ^.4. 

Step 1: Construction o/ /i, . . . , /„. 

Let i G {1, . . . , n}. We construct a mapping /j G valc(T, Bi) and an injective homomorphism 

Tj from fi{Bi) to f{Bi) with ri(fi{Bi)) = f{Bi) as follows. Pick an injective mapping 

fj: dom(/(Si)) ^ const(/(Si)) Unulls(Si) 

such that fj(c) = c for each c G const(/(i?j)), and fj(_L) G nulls(i?j) for each _L G 
nulls(/(Si)). Then define /j: dom(r) — > dom(T) U G such that for each u G dom(r), 

.c/ \ j^i(/(^))' if -u G nulls (5j) 

/j(iij := < 

I u, otherwise. 

By construction, we have /j G valc(T, Bi). Furthermore, for each atom A of f{Bi), we have 
fi(A) = A. In particular, each atom of f{Bi) is isomorphic to an atom of fi{Bi), and vice 
versa. Let rj be the inverse of fj on dom(/j(i?j)). Then, rj is an injective homomorphism 
from fi{Bi) to f{Bi) with 

niMB^)) = n{n{f{Bi))) = f{Bi). (6.14) 
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In particular, 

ri{A) ^ A for all atoms A G fi{Bi). (6.15) 

Step 2: Construction of the instances Ti and the homoniorphisms hi . 

Let i G {1, . . . , n}, and pick gi G minvalc{T, Bi) with gi{T) C fi{T). By Lemma [6. 191 there 

is a retraction /i^ of gi{T) over the set of the nulls of gi{Bi) \ (T \Bi) such that 

Ti := h[{gi{T)) G minc{T). (6.16) 

Define /ij : dom(g(j(r)) — )• dom(To) such that for each u G dom((7j(T)), 

/n(n), ifnGdom((7,(r)\(r\i?,)) 
l/(n), otherwise. 

Note that rj is defined for all values that occur in dom((7i(T) \ (T \ -Bj)), since gi{T) C 
/;,(r) = /i(Bi) U (T \ S,), and therefore, 

g,{T)\{T\Bi) <Z fi{Bi). (6.17) 

Furthermore, 

h,{g,{T)\{T\Bi)) = u{g,{T)\{T\B,)) ™ r,(/,(i?,)) ™ /(i?.) ^ ^o, 
and 

hi{T\Bi) = f{T\Bi) C /(T) = To, 
which yields hi{gi{T)) QTq. In particular, 

hi{h[{gi{T))) C hi{gi{T)) C Tq. 
Since hi o h'i o gi G valc{T) and Tq G minc{T), we have hi[h'^[gi{T))) = Tq, and hence, 

/i,(T,) ^ /i.(/i^(5i(T))) = To. 

Let /ij be the restriction of hi to dom(rj). Then, clearly, hi is a homomorphism from Ti to 
To with hi{Ti) = Tq. 

Step 3: For each A e Tq there are j G {1, . . . , n} and A' G Tj with hj{A') = A and A' = A. 
Let 

n 

T* :=(j{g,{B,)\{T\B,)). 

To prove Q, it suffices to show that there is a mapping r: dom(T*) — )• dom(To) with 

(1) r(r*) = To, 

(2) r{A') ^ A' for each A' £T*, and 

(3) r{A') = hi{A') for each ie{l,...,n} and each A' G gi{Bi) \ (T \ Bj). 

Indeed, let A G Tq. Since r{T*) = Tq by condition [H there is some A' G T* with r(yl') = yl. 
So, by the construction of T* , there is an i G {1, . . . , n} such that A' G gi{Bi) \ {T\Bi) C Tj. 
Condition [3] then yields hi{A') = r{A') = A, and since r{A') = A' by condition [21 we have 
A' ^A. 

Define r: dom(|J"^j^ fi{Bi)) -^ dom(To) such that 

r{u) = ri{u) for all i G {1, . . . ,n} and u G dom(/j(i?j)). 
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This is well-defined, since nulls (/i(i?i)) D nulls{fj{Bj)) = for all distinct i,j € {1, . . . ,n}, 
and each Tj is the identity on constants. We claim that r satisfies conditions [TH3] above. 

To see that r satisfies condition [2l let A' (^ T* . Then there is some i € {1, . . . ,n} with 
A' e gi{Bi) \{T\ Bi). By (I6T5D and (fOTll . we thus have r{A') = ri{A') ^ A'. 

To see that r satisfies condition [3l let i € {1, . . . , n} and A' G gi{Bi) \{T\ Bi). Then, 
r[A') = ri[A') = hi{A'), where the last equality follows from the construction of /ij. 

It thus remains to show that r satisfies condition [H that is, r{T*) = Tq. Note that, by 
(|6.17p . we have 

n 

T* C [J MB,). 

i=l 

Hence, 

(n \ n n n 

\Jfi{Bi)] = U^(/^(^^)) = [jr^{f^{B^)) ^ [jf{Bi) = Tq. 
i=l / 4=1 j=l i=l 

To show that r[T*) = Tq, we show that there is some /* G valc{T) with f*{T) = T*. Then, 
f ■- ro f* £ valciT). Since Tq G minc{T) and f'{T) = r{f*{T)) = r{T*) C Tq, this 
implies r[T*) = Tq, and the proof is complete. 

Thus, it remains to show that there is a mapping /* € valc{T) with f*{T) = T* . 
Basically, /* is obtained by repeated application of the mappings gi, . . . , g^. 

Let us first modify gi, . . . ,gn as follows. Choose an arbitrary "renaming" of the nulls of 
T. That is, pick an injective mapping p: dom(T) — ?• const(T) U {Null\ nulls(T)) such that 
p{c) = c for each constant c G const (T). Note that p maps each null of T to a unique null 
that does not occur in T. Let 

X := p(nulls(r)). 
For each i G {1, . . . , n} then define gi : dom(T) L) C L) X —?■ dom(T) L) C L) X such that for 
each u G dom(T) UCUX, 

gi{p~^{u)), if u G p(nulls(-Bj)) and gi{p^^{u)) G nulls(5j) 

gi{u) := < p{gi{p'^{u)), if -u G p{mi\h{Bi)) and gi{p^^{u)) ^ nulls(Bi) 

n, otherwise. 

v 

Note that for each i G {1, . . . , n}, 

gi{p{Bi)) \ p{T) = g,{Bi) \ (T \ Bi). (6.18) 

Now let 

g := gnO---og2ogi. 

Recall the graph G mentioned at the beginning of the proof, on pagel36l Then an application 
oig to an atom p{A) with A G Bi^ corresponds to following the maximal path in G that starts 
in A and proceeds to atoms A' G Bi^ , A" G -Bjg , . . . with ii < ^2 < ^3 < ■ ■ ■ • If A'" G Bj is 
the endpoint of this path, then either gj{A"') ^T\Bj and g{p{A)) = gj{p{A"')) = gj{A"'), 
or gjiA'") GT\Bj and g{p{A)) = p{A"'). 
For each s > let 

-s .^ fP' if s = 0, 

|go^^-i, ifs>l. 



ANSWERING NON-MONOTONIC QUERIES IN RELATIONAL DATA EXCHANGE 39 



We show by induction that 

g\T) ^ g\T) ^ g\T) ^ ■ ■ ■ . (6.19) 

To prove g^{T) D f{T), let A e g^{T). li A e T* , then by (I6J8D . we have A e 5^^)- 
Otherwise, if ^ G p{T), there is an A' G g^{T) with g{A') = A and A' G p{T). Since 
A' G p{T), we have yl G g{p{T)) = g^{T), as desired. To prove g'-^'^{T) D 5*+^(T) for i > 1, 
let A G 5*+^(T). Then there is an ^' G 5*+^(r) with 5(^') = ^. Since 5'+^(T) C ^*(r) by 
the induction hypothesis, we have A G g{g^(T)) = g'^~^^{T), as desired. 

By (j6.19p and since g^{T) is finite, there is an sq > 1 such that ^*°(T) = g^(T) for each 
s>so. Let /* := g'°. We show that f*(T) = T*. 

First observe that 

n 

T* = \J{g,{B,)\{T\B,)) = g\T)\p{T) = nT)\p{T). 
1=1 
To see that T* is not a proper subinstance of f*{T), we show that f*{T) contains no atoms 
from p{T). 

For a contradiction, suppose that f*{T) contains an atom A G p{T). Then, A G p{Bi) for 
some i G {1, . . . ,n}. Since g{f*{T)) = f*{T), we know that ^ is a bijection on dom(/*(T)). 
Furthermore, since g is the identity on dom(T) U C, we have ^(-L) G X for each ± G --f . It 
follows that 

gi{A)^A, and ^,(y4) G p(-Bj) for some j G {1, ... ,n} \ {i}. 

Let A' := p~^{A). By the construction of gi, we have 

gi{A') = A', and gi{A ) G -Bj for some j G {1, . . . , n} \ {i}. 

Since Bj is packed and gi maps each null in A' to a null in Bj, each atom in gi{Bi) contains a 
null from Bj. Together with g^ G valc{T,Bi), this implies gi{Bi) C S^. In other words, (^j is 
a homomorphism from T to T\Bi, which contradicts the fact that T is a core. Consequently, 
we must have /*(r) =r*. D 

Corollary 6.24. Let T be an instance such that T is a core, and each atom block of T is 
packed. Let C C Const, and let A be an atom. Then the following statements are equivalent: 

• There is an instance in m,inc{T) that contains an atom isomorphic to A. 

• There is an atom block BofT such that some instance in minc{T,B) contains an atom 
isomorphic to A. 

The following polynomial time algorithm for deciding whether R{i) occurs in some minimal 
instance in poss{T) immediately suggests itself. Let C be the set of all constants in t. 
Consider each atom block B of T, and each Tq G minc{T, B) in turn, and accept the input 
if and only if R{t) G Tq for some Tq. By Proposition 16. 18} the instances Tq can be computed 
in polynomial time. 

The following example shows that the proof of Lemma 16.231 fails if T contains atom 
blocks that are not packed. 

Example 6.25. Let E^ be a binary relation symbol, and consider the instance T over {E} 
with 

E^ = {(±i,a),(±i,6),(±i,±;),(±;,c),(±2,a),(±2,fe),a2,^2),(c,^2)}- 
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Note that T is a core, and that T has the two atom blocks 

Bi = {E{±i,a),E{±i,b),Ei±i,±[),E{±[,c)}, and 
B2 = {E{U,a),E{±2,b),E{±2,l-2),E{c,l.'2)}. 

See Figure |3] for a graph representation of T and its two atom blocks Bi and B2 ■ Note also 




Figure 3: The instance T, and the two atom blocks Bi and B2 of T, which are the subin- 
stances induced by the vertices in the corresponding dashed rectangles. 

that neither Bi nor B2 is packed. 

Consider / € va^{T) with /(J-i) = /(-L2) = o and /(J-x) = /(-L2) = ^- Then it is not 
hard to see that 

/(T) = {^(a,a),^(o,6),S(6,c),^(c,6)}Gm^n0(^). 
Furthermore, for the mappings /j created in the proof of Lemma 16.231 we have 

h{T) = {E{a, a),E{a, b),E{b, c),E{U , a),E{U,h),E{^2, l-2),E{c, ±'2)} 
and 

f2{T) = {E{a,a),E{a,b),E{c,b),E{±ua),E{±i,b),E{±i,±[),E{±[,c)}. 

For gi € valiij{T, Bi) with 5j(-Lj) = J-s-i and gj(-L^) = 6, it holds that gi € minval(i,{T, Bi), 
and moreover, 

gi{T) = {E{±2,a),E{±2,b),E{±2,l-'2),E{c,±'2),E{b,c)}emirMi,{T), 

g2{T) = {E{±i,a),E{U,b),E{±i,±[),E{±[,c),E{c,b)} e fmMT). 

Note that E{a,a) and E{a,b) occur in fi{Bi), but neither gi{T) nor g2{T) contains E{a,a) 
or E{a,b). 



6.3.3. Proof of Theorem \6.^ This section finally proves Theorem 16.81 Let M = (o", r, S) be 
a schema mapping, where S consists of packed st-tgds, and let q{x) be a universal query 
over r. We show that there is a polynomial time algorithm that, given as input an instance 
T := Core(M, S) for some source instance S for M, and a tuple i € Consr^', decides whether 

t E certccWA* {q, M, S) . 

As shown in Section [6.3.11 we can assume that t is a tuple over const (T) Udom(g), and 
that in this case we have t ^ certQQ^p^* {q, M, S) if and only if there is a nonempty finite set 
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T of minimal instances in poss{T) such that IJ "^ N ~'9(^- Now observe that -ig is logicahy 
equivalent to a query q of the form 

m 

j=l 
where each qi is an existential query of the form 

qi{x) = 3yi /\ (fij, 
i=i 
and each ipij is an atomic FO formula or the negation of an atomic FO formula. Indeed, since 
g is a universal query, we have -ig = 3yip{x,y), where ip is quantifier-free. By transforming 
(p into "disjunctive normal form", we obtain a query of the form 3y\/i^i KjLi'^i,]^ where 
each <pi^j is an atomic FO formula or the negation of an atomic FO formula. By moving 
existential quantifiers inwards, we finally obtain q. It remains therefore to decide whether 
there is some i S {1, • • • ,m} and a nonempty finite set T of minimal instances in poss{T) 
such that U T 1= qi{t). 

Fix some constant bs as in Lemma 16.141 Then, for each atom block B of Core(M, 5"), 
we have |nulls(i?)| < bs. Furthermore, Proposition 16.221 tells us that each atom block of 
Core(M, S) is packed. We can now use the following algorithm to decide, given as input 
an instance T := Core(M, 5) for some source instance 5* for M, and a tuple i G Consv', 
whether i G cer^GCWA* {q, M, S): 

(1) Determine the atom blocks of T and check whether each atom block i? of T is packed 
and satisfies |nulls(i3)| < bs; if not, reject the input. 

(2) Check whether T is a core; if not, reject the input. 

(3) For each i € {1, . . . , m}: 

(a) Check whether there is a nonempty finite set T of minimal instances in poss{T) such 

that UTh9^(i)• 
(b) If such a T exists, reject the input. 

(4) Accept the input. 

Step [1] clearly runs in polynomial time, and step [2] can be implemented in polynomial time 
using the algorithm from Lemma 16.161 (that algorithm outputs T if and only if T is a core) . 
Lemma 16.261 below tells us that step [3a] can be implemented in polynomial time as well. 
Thus, once Lemma [6.261 is proved, the proof of Theorem 16.81 is complete. 

Lemma 6.26. Let q{x) = 3yip{x,y) be a FO query over t, where ip = /\^^i'Pi, and each 
ifi is an atomic FO formula or the negation of an atomic FO formula. For each positive 
integer bs, there is a polynomial time algorithm that decides: 



COREEVALt-_6s 

Input: an instance T over t such that T is a core and each atom block ofT is packed 

and contains at most bs nulls; and a tuple i € Const'^' 
Question: Is there a nonempty finite set T of minimal instances in poss(T) such that 



The remaining part of this section is devoted to a proof of Lemma 16.261 

Let q(x) = 3y (p{x, y) be as in the hypothesis of Lemma [6.261 Without loss of generality, 
there is no variable that occurs both in x and in y, and (p has the form AiLi V'ji where each 
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ifi is a relational atomic FO formula, the negation of a relational atomic FO formula, or the 
negation of an equality. Let hs be a positive integer. 

Suppose we are given an instance T over r, where T is a core, each atom block of T 
is packed, and each atom block of T contains at most hs nulls, and a tuple i G Consi'^'. 
In a first step, we rewrite c^ to a formula if: by replacing each variable x m. x with the 
corresponding constant assigned to x by i. That is, \i x = (xi, . . . , x^) and i = (ti, . . . , tfc), 
then ijj is obtained from ^p by replacing, for each i G {!,... ,/c}, each occurrence of the 
variable Xj in (/? by tj. Let 

q:= 3yi){y). 
To check whether there is a nonempty finite set T of minimal instances in poss(T) such 
that U"^ N 9(^> it suffices to check whether there is a nonempty finite set T of minimal 
instances in poss{T) such that IJ "^ N ^■ 

Suppose that -0 has the form 

k I m 

V'(y) = l\ Ri{xi) A /\ -Qi(u)i) h l\^vi = v\. 

«=1 j=l i=l 

Let C be the set of constants that occur in f/^, and for each i S {1, . . . , /c}, let Xi be the set 
of all variables in Xj. Given an assignment a for a set X of variables, and a tuple t over 
XU Const, we sloppily write a(t) for the tuple obtained from t by replacing each occurrence 
of each variable x (z X in t with a{x). 

The idea for finding a nonempty finite set T of minimal instances in poss(T) with 
U T ^ Q" is as follows. In the first step, we compute, for each i € {1, . . . , k}, the set of all 
pairs (Tj, Oj) such that Tj G minc{T, B) for some atom block B of T, and ai{xi) £ R-''. Thus, 
modulo renaming of values that do not occur in const(T) U C, we enumerate the possible 
assignments a, of Xj under which Ri{xi) is satisfied in some minimal instance in poss(T); the 
instance Tj can then be considered as a witness to this fact. In the second step, we try to 
join the pairs (Ti, ai), . . . , (T^, a^), where each (Tj, a,) is a pair computed for i in the first 
step, to a single pair (T,a) such that T satisfies each Ri{xi) under the assignment a. The 
instance T will actually be the union of isomorphic copies /9i(Ti), . . . , Pk{Tk) of the instances 
Ti, . . . , Tfc. In particular, the set {pi(ri), . . . , p}^{T}^)} is already close to the desired set T: it 
is a finite set of instances from min(7(T), and its union satisfies the subformula /\i^i Ri{xi) 
of q under a. Not all pairs (Ti, ai), . . . , (T^, at) can be joined together. We will join only 
pairs that are compatible in the sense of Definition 16.271 below. By taking care in how those 
pairs are joined together, and using Lemma [6. 23^ we can show that the desired set T exists 
if and only if the instance obtained from T by adding a large enough, but constant, number 
of isomorphic copies of T to T satisfies q. 

More precisely, the algorithm proceeds as follows. Fix the constant 

I 
s := k + y^ \wi\ +2-171. 

i=l 

In the above description, s — k is the number of isomorphic copies of T that will be added 
to T. For each i € {1, . . . , s}, pick an injective mapping 

Pi : dom(T) U C — > Dom 

such that pi{c) = c for each c € const(T) U C, Pj(-L) € Null for each _L G nulls(T), and such 
that for all distinct i,j G {1, . . . , s}, we have nulls(pj(T)) nnulls(/9j(T)) = 0. Then compute. 
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for each i S {1, . . . , k}, the set 

Pdi := < {To, a) I there is some Tq E minc{T,B) and an atom block B oiT 

such that To = pi{TQ), a: Xi — t- dom(To), and a(xj) G -Rj" >. 



(6.20) 



Now we would like to join pairs (Ti,ai) G Xi, . . . ,{Tj,,ak) G X/. into single pairs {T,a) 
according to the above description. However, we would like to do this only if the pairs 
(Ti, ai), . . . , (Tfc, Ofc) are compatible in the following sense. Intuitively, (Ti, ai), . . . , (T^, a^) 
are compatible if the nulls in the image of each a^ can be consistently renamed such that 
the resulting mappings Qj agree on common variables. 

Definition 6.27 (compatible). We say that (Ti, ai) € Xi, . . . , {T^, Ofc) ^ Xk are compatible 
if there is an equivalence relation ~ on D := IJj^]^ aj(Xj) such that 

(1) for all i, j G {1, . . . , A;} and x G XiCi Xj, we have ai{x) ~ aj{x), 

(2) for all n, n' G Z?, if u ~ n' and u G Const, then u = n', and 

(3) for all i G {1, . . . ,k} and x, x' G Xj, we have ai{x) ~ ai(a;') if and only if ai{x) = ai{x'). 

Proposition 6.28. There is an algorithm that, given {Ti,ai) G Xi, . . . ,{Tk,ak) G X^ as 
input, decides in time linear in the size of T whether (Ti, ai ),..., (T^, a^) are compatible, 
and if so, outputs an equivalence relation ~ on D := \^^^i ai{Xi) that satisfies conditions{I\- 
of Definition \6. 2l[ In fact, ~ is the smallest such equivalence relation (with respect to set 
inclusion). 

Proof. Given (Ti, ai) G Xi, . . . , {Tk,ak) G X^, the following algorithm computes the desired 
relation ~ if it exists: 

(1) Initialize ^ to be {{u, u) \u G Qj(Xj) for some i G {1, . . . , fc}}. 

(2) For all i, j G {1, . . . , k} and x G Xj fl Xj, add {ai{x),aj{x)) to ~. 

(3) For all i G {!,... ,/c} and x,x' G Xi with ai{x) = ai{x'), add (aj(x),aj(x')) to ~. 

(4) Update ~ to be the symmetric and transitive closure of ^. 

(5) If ^ satisfies conditions [2] and [3] of Definition 1 6. 2 7 1. then output ~; otherwise output "not 
compatible". 

Since k and Xi, . . . , X^ are constant, it should be clear that each of the steps HHS] can be 
accomplished in constant time, after building the necessary data structures from the input 
in time linear in the size of T (note that each Tj is at most as large as T, so that the length 
of the input is linear in the size of T) . 

It is now not hard to see that if the algorithm outputs a relation ~, then ~ is an 
equivalence relation on D := Uj=i Oiii^i) that satisfies conditions [THS] of Definition 16.271 In 
particular, (Ti, ai), . . . , (T^jOfc) are compatible. Even more, r^ is the smallest such equiv- 
alence relation, since every equivalence relation ~* on D that satisfies conditions [TH3] of 
Definition 16.271 must contain the pairs put into -^ in steps [iHl] of the algorithm. The same 
argument shows that the algorithm outputs a relation ~ if there is an equivalence relation 
~* on D that satisfies conditions [THSl of Definition 16.27] that is, if (Ti, ai ),..., (Tfc,afc) are 
compatible. □ 

We now define the join of compatible pairs (Ti,ai) G -^i, . . . , (T^, a^) G X/^. Given 
an equivalence relation ^ on D := |Jj=i (^ii^i) ^s in Definition I6.27| the idea of the join 
is to identify values u,u' G D with u ~ u' , and to "glue" the resulting instances Ti and 
assignments Oi together to a single instance T and assignment a. 
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Definition 6.29 (Join). Let {Ti,ai) G Xi, . . . ,{Tk,ak) € X^ be compatible, and let ~ 
be the smallest equivalence relation on D := Ui=i'^«(^«) that satisfies conditions [TH3] of 
Definition 16.271 Pick some linear order ^ on the elements of D, and for each u (z D,let uhe 
the minimal element in [u] := {u' € D \ u' ^ u} with respect to :<. For each «€{!,..., k}, 
define Tj: dom(Tj) — )• Dom such that for each u £ dom(Tj), 

__ \u, if -u G ai{Xi), 
I u, otherwise. 

Then the join of (Ti,ai), . . . , (T^, Ofc) is the pair (T,a), where T := Uj=i''i(^i)> ^^'^ 
a : IJi=i -^i ~^ Dom is such that for each x £ Ui=i -^ii 

Vi(a;i(a;)), if x G Xi 
a{x) := < I 

,rk{ak{x)), a X £ Xk- 

Note that different choices of < yield different joins. For definiteness, we can generate 
:< as follows. Initialize < to be the empty relation. For increasing i = 1,2, ... ,k, consider 
the variables x G Xj in some predefined fixed order, and if u := ai{x) does not already 
occur in ^, add u as the new maximal element to :<. This takes constant time, since k and 
Xi, . . . , Xk are fixed. For the following construction, it is not important that the join always 
yields the same result - the join resulting from any linear ordering ^ on D is fine. What is 
important are the properties summarized in Proposition 16.30) below. Note also that modulo 
the choice of ^ , a is well-defined by the construction of ~ and ri , . . . , r^ : if x € Xi D Xj, 
then ai{x) ~ aj{x), and thus, ri{ai{x)) = rj(aj{x)). 

Proposition 6.30. The join (T, a) of compatible pairs (Ti, ai) G Xi, . . . , (Tk, Ofc) G X^ can 
be computed in time linear in the size of T and has the following properties. Let ri, . . . , r^ 
be the mappings used in the construction of (T, a). Then for all i,j G {1, . . . , k}: 

(1) For all c G const(Tj) and _L G nulls(Tj), we have rj(c) = c and rj(_L) G Null. 

(2) Let ~ be an equivalence relation on D := Uj=iCKi(^j) ^^^^ satisfies conditions l^\^ of 
Definition \6. 27[ Then for all u G dom(Tj) and u' G dom(Tj), 

ri{u) = rj{v!) =^ u = u' or: u G ai{Xi), u' G aj{Xj) and u ~ u' . 

Furthermore, if ~ is the smallest such equivalence relation, then 

ri{u) = rj{u) <;=^ u = u' or: u G ai{Xi), u' G aj{Xj) and u ~ u' . 

(3) Tj is injective. 

(4) a{xi) G Rj. 

Proof. Let us first see that (T, d) can be computed in time 0{n), where n is the size of T, 
given (Ti, ai), . . . , [Tk, ak) as input. By Proposition 16. 28| the relation ~ can be computed in 
time 0{n). Since k and Xi, . . . , X^ are fixed, the linear order ^ can be computed in constant 
time. Furthermore, since k is constant and all Tj have size at most n, the mappings rj and 
the join (T,a) can be computed in time 0{n). We next prove [THU 

Arfd' Let c G const(Tj). If c ^ ai{Xi), then by the construction of rj we have rj(c) = c. 
Otherwise, if c G aj(Xj), there is some x £ Xi with ai{x) = c, so that by the construction 
of Tj we have c = ai{x) ~ rj(aj(x)) = ri{c). Condition [2] of Definition 16.271 then yields 
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ri{c) = c. Next let _L € nulls(ri). As above, if _L ^ ai{Xi), then ri(_L) = _L € Null. 
Otherwise, ± ~ ^j(-L), so that by condition [2] of Definition 16.271 rj(_L) E Null. 

Adl^- Let r^ be an equivalence relation on D that satisfies conditions [lH3] of Definition 16.271 
We first prove the statement for the case that ~ is the smallest such relation. That is, we 
have to show 

ri{u) = rj{u) -^^ u = u' or: u (z aj(Xj), u € aj{Xj) and u ~ u' . (6.21) 

We first prove the direction from right to left. Suppose that u = u' . If i = j, we have 
ri{u) = rj{u'). If i ^ j, then nulls(Tj) n nulls(Tj) = implies that u and u' are constants, 
and therefore rj(n) = u = u' = rj{u') by[TJ Suppose next that u € aj(Xj), u' G aj{Xj) and 
u ~ u' . Pick X € Xi and x' € Xj such that aj(x) = u and q:j(x') = u' . Then, ai{x) ~ aj(x'), 
and the construction of r^ and rj immediately implies rj(u) = rj{u'). 

We next prove the direction from left to right. Let rj(u) = rj{u'). We distinguish the 
following cases: 

(a) u ^ UiiXi) n Null and u' ^ "i(-'^i) n A^it//. 

(b) n e UiiXi) n A^u// or u' G aj(Xj) n iVti//. 

In case (a), by the construction of rj,rj and by[Tl we have ri[u) = u and rj{u') = u' . Since 
rj(u) = rj{u'), this implies u = u' . 

So assume case (b). By symmetry it suffices to deal with the case that u G ai{Xi)r)Null. 
By the construction of r^ , we then have 

u ~ ri{u) = rj{u'). 

We claim that u' G aj{Xj). Suppose, to the contrary, that u' ^ aj{Xj). By the construction 
of rj , we have 

u' = rj{u') = ri{u) ~ u. 
Note that u G q;j(Xj), ri{u) = u' and the construction of rj imply that u' G Z?. Pick 
p G {1, . . . , /c} and x G Xp with ap(x) = u' . By n G A^u//, rj(n) = u' and[Tl we have u' G A^u//. 
Moreover, since u' G nulls(Tj), n' = ap{Xp) G nulls(rp), and nulls(rj) n nulls(rp) = for 
j ^ p, we have p = j. This, however, implies that u' G aj{Xj), which is a contradiction to 
our assumption that u' ^ aj{Xj). Hence, u' G aj{Xj). By the construction of rj, we have 
u' ~ i'j{u') ~ ii. In particular, w G ai(Xj), u' G aj(Xj) and w ~ u' , as desired. 

Finally, let ~* be another equivalence relation on D that satisfies conditions HHS] of 
Definition 16.271 We show that 

ri{u) = rj{u') =^ u = u' or: u £ Qj(Xj), u' G aj{Xj) and u ~* u' . 

Let rj(n) = rj{u'). By (j6.2ip . we have u = u', or: u G ai{Xi), u' G aj(Xj) and u ~ n'. By 
minimality of ^, u ^ u' implies u ~* u' , so that u = u', or: n G Qj(Xj), u' G Qj(Xj) and 
u ^* u' , as desired. 

y4(i[3- Let u,u' G dom(rj) be such that ri{u) = ri(u'). We have to show that u = u'. ByO 
we have u = u', or: u,u' G ai{Xi) and u ^ u' . If u = u', we are done. So assume that 
u,u' G ai{Xi) and u ~ u'. Let x,^' G Xi be such that ai{x) = u and aj(x') = u' . Then 
ai{x) ~ ai{x'), and by condition [3] of Definition 16.27) we have u = ai{x) = ai{x') = u', as 
desired. 

y4(i[2' This follows immediately from the construction of T, a, and[TJ □ 
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We can now give the algorithm for COREEVAL^^fts: 

Algorithm 6.31 (Main algorithm). 

Input: an instance T over r that is a core and each atom block of T is packed and contains 

at most bs nulls; a tuple i € Const'^' 
Output: "yes" if there is a nonempty finite set T of minimal instances in poss{T) such that 

(jT ^ q{t); otherwise "no" 

(1) Compute q = 3y ^(y) and choose pi, . . . , ps- (Recall that each pi is an injective mapping 
from dom(r) UC to Dom that is the identity on constants, maps nulls to nulls, and that 
nulls(/9j(r)) n nulls(/9j(T)) = for distinct i,j.) 

(2) Compute the sets Xi, . . . , X/^ according to (j6.20p . 

(3) For all (Ti, ai) e Xu . . . , (T^, a^) G Xk: 

(a) Check whether (Ti, ai), . . . , (T^, a^) are compatible; 
if not, continue with next (Ti, ai), . . . , (T^, a^). 

(b) Let (T, d) be the join of (Ti, ai), . . . , (T^, a^). 

(c) If T U IJj'^^,,.;^ PiiT) satisfies q, output "yes". 

(4) Output "no". 

Let us now show that the algorithm decides COREEVALT-^feg in polynomial time. For a more 
precise upper bound on the algorithm's running time, see [211 Lemma 5.40]. 

Lemma 6.32. Algorithm \6.31\ runs in time polynomial in the size of T . Furthermore, the 
following two statements are equivalent: 

(1) There is a nonempty finite set T of minimal instances in poss{T) such that \JT \= q. 

(2) Alaorithm \6.31\ outputs "yes" on input T and i. 

Proof. It is not hard to see that the algorithm runs in time polynomial in the size of T. 
Indeed, the transformation from q to q can be accomplished in constant time (since q is 
fixed), and the mappings pi, . . . , ps can be generated in polynomial time. It is also not hard 
to compute the sets Xi, . . . ,Xk in polynomial time: All we need to do in order to compute 
Xi for i € {1, . . . , A;} is to iterate through all Tg G minc{T,B), where B is an atom block 
of T, and all assignments a: Aj — > dom(/j(TQ)), and to check whether Ri(a{xi)) £ p{Tq). 
By Proposition 16.18] and since Aj is fixed, this can be done in polynomial time. Since k is 
constant, all the sets Xi, . . . , X/^ can thus be computed in polynomial time. In particular, 
since each of these sets has polynomial size, there are at most a polynomial number of 
iterations of the algorithm's main loop. Propositions 16.281 and 16.301 imply that steps 3(a) 
and 3(b) of the main loop run in polynomial time. Finally, step 3(c) clearly takes only a 
polynomial number of steps. Altogether, the algorithm runs in polynomial time. 
It remains to show that the two statements [T] and [2] are equivalent. 

[3 =^ [I].' Assume that Algorithm 16.311 outputs "yes" on input T and t. Then there are 
compatible (Ti, ai) £ Xi, . . . , {Ti^,ak) G Xf^ such that the join (T, d) of (Ti, ai), . . . , (T^, a^) 
has the following property: the instance T* := T U |Jj'^^_,_;^ Pi{T) satisfies q. We construct a 
nonempty finite set T of minimal instances in poss{T) with [JT \= q. 

By Proposition 16.30] we have T = IJj=i^i(^i)) where each rj is an injective mapping 
from dom(Tj) to Dom with rj(c) = c for each c G const(rj), and rj(_L) G Null for each 
-L G nulls(Tj). For each i G {k + 1, . . . ,s}, let Ti := Pi{T), and let r^ be the identity 
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mapping on dom(Tj). Then, 

s 

T* = IJniTi). (6.22) 

4=1 

Note that each Tj is isomorphic to an instance Tj G Tninc{T). For i G {1, . . . ,k}, this follows 
from Lemma [6. 191 and the fact that Tj is isomorphic to an instance in minc{T,B) for some 
atom block B of T. For i € {k + 1, . . . ,s}, this follows from the fact that Tj = Pi{T), that 
T is a core, and Proposition 16. llTfS] ). For each i E {1, . . . , s}, let fi be an isomorphism from 
fi to Tj. 

Let v: dom(r*) — > const(r*) U(Consi\C) be an injective valuation of T*, and for every 
i G {1, . . . , s} let 

Vi := vori o fi. 

Note that Vj is an injective valuation of Tj. To see this, note that fi is an injective mapping 
from dom(Tj) to dom(Tj) that is legal for Tj, that rj is an injective mapping from dom(Tj) 
to dom(T*) that is legal for Tj, and that v is an injective valuation of T* . Furthermore, for 
each _L G nulls(Tj) we have fj(-L) ^ C, since both fi and Tj map nulls to nulls, and v maps 
nulls to constants in Const\C. In summary, Tj € minc{T), Vi is an injective valuation of Tj, 
and v^ (c) = c for all c € dom(uj(rj)) fi C. Together with Proposition 16. 1 ll fTj) . this implies 
that Vi{Ti) is a minimal instance in poss(T). 
So, 

r := {fi(T;) \l<i<s} 
is a finite nonempty set of minimal instances in poss{T), and 

\JT= \Jv,m) = U^(r-.(r.)) = V (\Ju{tA ^ v{T*). 

4 = 1 4=1 \4 = 1 / 

Since T* \= q, v is injective, and v maps nulls in T* to constants that do not occur in q, we 
conclude that \JT \= q. 

m =^ dl' Assume that there is a nonempty finite set T of minimal instances in poss{T) such 
that {JT \= q- We show that Algorithm 16.311 outputs "yes" on input T and t. 
Since IJ "^ N ^i there is an assignment (3: y ^>- dom(|J T) U C with 

U'^NV'(/3). 

In particular, we can pick for each i G {1, . . . , A;} an instance Ti ^T such that 

Note that there are at most s — k values in f3{y) \ C that do not occur in f3{xi) for some 
i G {1, . . . , k}. Thus, we can fix instances Tfc^_i, . . . ,Ts G T such that each of the values in 
/3(y) \ C that does not occur in f3{xi) for some i G {1, . . . ,k} belongs to dom(rj) for some 
j £ {k + 1, . . . ,s}. Now /3 is an assignment for ^ with range in dom(Ti U • • • U Tg) U C, and 
we have: 



IJfi^^iP). (6.23) 



4=1 
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Let i G {1,...,A:}. By Proposition I6.11I IT]). there is an instance Tj € minc{T) and an 
injective valuation Vi of Tj such that ViiTi) = Ti, and v~ (c) = c for ah c € dom(Tj) n C. In 
particular, 

Ai := R^{vr\(3{xi))) G f;. 
By Lemma |6.23| there is an atom block Bi of T, an instance T/ G minc{T,Bi), an atom 
A" G Tj' with A'- = Ai, and a homomorphism h'- from T/ to Tj such that h^{Tl) = Ti and 
h'^{A'l) = Ai. In particular, /ij := /i^ o pr is a homomorphism from Tj := Pi{T-) to Tj with 

hi{Ti) = ti and /ij(^i) = Ai, 
where A[ := Pi{A'-) = Ai. Let aj be an assignment for Xi such that 

A'i = Ri{ai{xi)). 

Note that {Ti,ai) G <-fj. 

In the following, we show that (Ti, ai), . . . , (T^, a^) are compatible, and if (T, a) is the 
join of these pairs, then T U y^l^^^j^-^ Pi{T) satisfies q. In particular. Algorithm 16.311 outputs 
"yes" on input T and i. 

The following properties of the assignments aj are crucial for showing this: 

Claim 1. Let i, j G {1, . . . , k}, x, x' G Xi and x" G Xj. Then, 

(1) ViihiioLiixi))) = /3{xi). In particular, Vi{hi{ai{x))) = /3{x). 

(2) If (3{x) G const(r) U C, then Qi(x) = j3{x). 

(3) ai(a;) = ai{x') if and only if (3{x) = /3(x'). 

(4) Qj(a;) = Qj(x") implies /3(x) = /3(x"). 

Proof. AdUl- Recall that hi{A'-) = Ai ^ Ti, and that Vi is injective on dom(Tj). In particular, 
we have hi{ai{xi)) = v^ (/3(xj)). Applying Vi to both sides yields Vi{hi{ai{xi))) = /3{xi). 

Ad\E- Let /3(x) G const(r) UC. BylH we have 

Viih^ia^ix))) = /3(x), (6.24) 

which implies 

hi{ai{x)) = p{x). (6.25) 

Indeed, if /3(x) G const(T), (|6.25p follows immediately from (|6.24p . const(T) C const(Tj), 
and the fact that Vi is an injective mapping from dom(rj) that is the identity on constants. 
On the other hand, if /3(x) G C, then (j6.25p follows immediately from (j6.24p . /3(x) G dom(Tj), 
and the fact that v~ (c) = c for all c G doin(Tj) n C. 

Now (I6.25P and hi{A^) = A[ imply that ai{x) is a constant, and since hi is the identity 
on constants, we have ai{x) = /3(x). 

y4(i 0' By [H we have Vi{hi(ai{xi))) = /3{xi). Recall also that Vi is injective, and that 
hi{A'-) = Ai = A'-, which implies that hi is injective on ai{Xi). Altogether, fi := Viohi is a 
bijection from ai{Xi) to j3{Xi). This implies that ai{x) = aj(x') if and only if /3(x) = P{x'). 

Ad^- Let ai{x) = aj{x"). If i = j, then /3(x) = /3(x") follows immediately from [3l So 
assume that i 7^ j. Since ai{x) G dom(Tj), aj{x") G dom(Tj) and nulls(Tj) PI nulls(Tj) = 0, 
ai{x) and aj{x") must be constants. By [T] and the fact that the homomorphisms hi,hj as 
well as the valuations Vi,Vj are the identity on constants, we conclude that /3{x) = aj(x) = 
aj{x") = f3{x"). J 
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We now show that (Ti, ai), . . . , [Tk^at) are compatible. To this end, we consider the 
relation 

~ := {{ai{x),aj{x')) \ i,j G {l,...,/c}, xeXi,x'(£Xj, /3(x) = /3(x')} 

oni^:=Uti«i(^i)- 
Claim 2. 

(1) For all i,j E {1, . . . , k}, x (z Xi and x' € Xj, we have 

ai{x) ~ aj(x') <;=^ /3{x) = j3{x'). 

(2) The relation ^ is an equivalence relation on D that satisfies conditions [TH3] of Defini- 
tion E2Z1 

Proof. Ad\^ Let i, j € {1, . . . , A;}, x G Xj and x' G Xj. If li{x) = /3{x'), then the definition 
of ^ immediately yields Qi(x) ^ aj{x'). 

On the other hand, let ai{x) ~ aj{x'). Then there are i' ,j' G {1, . . . , k}, y G Xj/ and 
y' G Xji such that 

aj'(y) = aj(x) and aj'{y') = aj{x'), (6.26) 

and 

(3{y)=/3{y'). (6.27) 

By (lOell and Claim [Ug]), we have /3{y) = /3(x) and /3(y') = /3(x'), which by (fOTll yields 
/^(a;) = f3{x'), as desired. 

Ad\^ It is easy to verify that ~ is an equivalence relation on D. Reflexivity and symmetry 
are clear, and transitivity is easy to show using [TJ 

It follows easily from[T]that ^ satisfies condition [TJ of Definition [6.27l Let i,j G {1, . . . , A;} 
and X G Xj n Xj. Since (3{x) = /3(x), [T] yields aj(x) ^ aj{x). 

For proving that r^ satisfies condition [2] of Definition 16.271 let u,u' G D be such that 
u ^ u' and u G Const. Since u ~ n', there are z, j G {1, . . . , /c}, x G Xj and x' G Xj such 
that aj(x) = u, aj{x') = u' , and 

/3(x) =/3(x'). (6.28) 

By Claim [T]|T]) , we have Vi{hi{ai{x))) = /3(x). Since aj(x) is a constant and hi,Vi are the 
identity on constants, this implies that aj(x) = /3(x). In particular, 

/3(x') ^ /3(x) = Qi(x) G const(Ij,) C const(r) U C. (6.29) 

By Claim [T]|2]) , this yields /3(x') = aj{x'), and therefore, 

l|6.29[ , , , 

u = aj(xj = p(x j = Qj(x j = u , 

as desired. 

Finally, for proving that ~ satisfies condition |3] of Definition 16.271 let i £ {1, . . . ,k} and 
x,x' G Xj. Then, 



/ \ / f\ Claim[T]|3} , , , ,, Claim[2][T} , , , ,, 

aj(xj = aj(x j <;=^ p(xj = p(x j <^=^ aj(xj ~ aj(x j, 



as desired. 
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By Claim [21 (Ti,qi), . . . , {Tk,ak) are compatible. Let (To,ao) be their join. We show 
that 

s 

T* := To U U T, 

i=k+l 

satisfies q, where Tj := pi{T) for each i € {A; + 1, . . . , s}. To this end, we construct an 
assignment a for ■0 such that T* \= ip(a). 

Claim 3. There is a homomorphism /iq from Tq to Tq := IJi=i ^j with /lo(To) = Tq, and 
hQ{ao{xi)) = j3{xi) for each z G {1, . . . , k). 

Proof. Let ri, . . . , r^ be the mappings used to construct the join (T, a). Then, 

k 

TQ = \Jn{Ti), (6.30) 

and for all i G {1, . . . , A;} and x G Xj, 

ao(x) = ri(aj(j;)). (6.31) 



By Proposition 16.301 each rj is injective; furthermore, for all i,j G {1, . . . , A;}, u G dom(T'j) 
and li' G dom(Tj), 

rj(n) = Tjiv!) =^ u = u' , or: u G aj(Xj), u' G aj{Xj) and u ~ u'. (6.32) 

Define /ig : dom(To) — > dom(ro) such that for all i G {1, . . . , A;} and u G dom(rj(rj)), 

/io(n) = t;,(/ii(r-^(n))). (6.33) 

We claim that /iq is a homomorphism from Tq to Tq with /lo(To) = Tq, and that for each 
z G {1, . . . , /c} we have /io(ao(2;j)) = /3(xj). 

5'iep i.' /iQ 2S well-defined. 

Let u G dom(rj(Tj)) n dom(rj(Tj)), where i,j G {1, . . . , A:} are distinct. Let tij := r~ (n) G 

dom(rj) and Uj := r^ (u) G dom(rj). We must show that 

Vi{hi{ui)) =Vj{hj{uj)). 

Since ri{ui) = u = rj{uj), (I6.32P implies that Ui = Uj, or: Wj G aj(Xj), Wj G aj(Xj) and 
Ui ~ Uj. If Ui = Uj, then both Ui and Uj are constants, since nulls(rj) PI nulls(rj) = for 
i ^ j; therefore, 

Vi{hi{ui)) = Ui = Uj = Vj{hj{uj)), 

as desired. On the other hand, let Xi G Xi and Xj G Xj such that Ui = aj(xj), Uj = aj{xj) 
and ai(xi) ~ aj{xj). Then Claim [2]|T]) implies /3(xj) = f3{xj). By Claim [DIT]) , 

Vi{hi{ui)) = Vi{hi{ai{xi))) = /3(xj) = /3(xj) = Vj{hj{aj{xj))) = Vj{hj{uj)), 

as desired. Altogether, this shows that /iq is well-defined. 

Step 2: ho is a homomorphism from, Tq to Tq with /lo(ro) = Tq. 
First note that for each i G {!,... ,A;}, we have 

hQ{n{Ti))^v,{h,{T,)) = fi. 



ANSWERING NON-MONOTONIC QUERIES IN RELATIONAL DATA EXCHANGE 51 



Hence, 

(k \ k k 

i=l J i=l i=l 

Step 3: For each i E {1, . . . ,k}, we have hQ(ao{xi)) = /3{xi). 
We have 

ElU , ... ESI , , .- ^^^ Claim [mj 

ho{ao{xi)) = ho{ri{ai{xi))) = Vi{hi{ai{xi))) = p[xi). j 

Let /iQ be a homomorphism as in Claim |3l It is easy to extend /iq to a mapping h on 
dom(T*) U C with the following properties: 

(1) /i(ro) = ^o(To) = UtiT'^, 

(2) h{Ti) = Ti for each i G {A; + 1, . . . , s}, and 

(3) h{c) = c for each c ^ C. 

Note that the second condition can be satisfied, since for all distinct z G {A; + 1, . . . , s} and 
j G {1, . . . , s}, we have nulls(Tj) n nulls(Tj) = 0, Tj = T and % G poss(T). Note also that 

s 

h{T*) = [jf,. (6.34) 

Furthermore, extend ao to an assignment a for y such that 

h{a{y)) = f3{y) for each y G y. (6.35) 

Note that (|6.35p holds for all variables y that occur in Xi for some i G {1, . . . , k}, because 
h is an extension of Hq, and a is an extension of oq. For each variable y & y that does 
not occur in Xj for some i G {!,...,/:;}, we pick an arbitrary value u G dom(T*) U C with 
h{u) = /3{y) and define a{y) := u. Note that such a value u always exists. First recall that 
the range of /3 is in dom(lJ^^-^ Tj) U C. If f3{y) G dom((Jj'^^ Tj), then by (I6.34P there is some 
u G dom(r*) with h{u) = P{y). On the other hand, if /3(y) G C, then /i(/3(y)) = /3(y), 
because h is the identity on constants, so that we can choose u = /3{y). 

We are finally ready to show that T* \= ip{a). First note that by Proposition 16.301 141). 
we have aQ{xi) G i?j " for each iG {!,..., A;}; since Tq C T* and a extends ao, this implies 

a{xi) G R^ for each i G {1, . . . , k}. (6.36) 

Furthermore, we have 

a{wi)^Qj'* for each i G {!,...,/}. (6.37) 

Otherwise, if there is some i G {1, . . . , /} with a{wi) G Qf , then by (16. 34^ and (16.35p . we 
have 

/3(tS,)GQU?-'"\ 
which is impossible by (I6.23p . Finally, we have 

a{vi) 7^ a{vl) for each i G {1, . . . , m}. (6.38) 

Indeed, let i £ {1, . . . ,m}. By (I6.35p . we have h{a{vi)) = /3(fj) and h{a{v'^)) = /3(v,-). On 

the other hand, (|6.23p implies that /3(t'j) 7^ l^iv'^), so that a{vi) and a{v'-) must be distinct. 

Altogether, (I6.36p - (l6.38p imply that T* \= ^(a). In particular. Algorithm 16.311 outputs 

"yes" on input T and t. □ 
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6.3.4. Proof of Proposition \6.4\ We conclude this section by proving Proposition 16.41 Let 
M = ((T, r, S) be a schema mapping, where S consists of st-tgds, and let 5 be a universal 
query over r. As in Section [6.3.31 we can assume that -ig is logically equivalent to a query 
q of the form 

m 

i=l 

where each Qi is an existential query of the form 

qi{x) = 3yi /\ ipij, 
j=i 
and each c/pjj- is an atomic FO formula or the negation of an atomic FO formula. 

Let S be a source instance for M, and let t G Const'^' . As shown in Section [6. 3. 11 we have 
t ^ certQQ-\j^x* {q, M, S) if and only if there is a nonempty finite set T of minimal instances 
in poss{Core{M, S)) with IJ "^ N ~'9(^- Hence, on input 5 and i, a nondeterministic Turing 
machine can decide whether i ^ certQQ'\j^ji^*(q,M,S) by computing Core(M, S), and by 
deciding for each i € {1, . . . ,m} whether there is a nonempty finite set T of minimal 
instances in poss{Core{M , S)) with IJ"^ N Qii^- If so, it accepts the input, and otherwise, 
it rejects it. 

By Theorem 12. H Core(M, S) can be computed in time polynomial in the size of S (for 
fixed M). 

In order to check whether there is a nonempty finite set T of minimal instances in 
poss{Core{M, S)) with IJ"^ N Qi(J)j it suffices to "guess" a set T of at most 

s := Hi ■ max{ar(i?) | i? G r} 

instances in poss{Core{M, S)), and to check whether IJ "^ N Qii^- Indeed, let T be a set 
of minimal instances in poss(Core(M, S)) with (J '^ N Qi{^- Then there is an assignment a 
for the variables in x and yi such that a{x) = i and IJ "7" |= V^i,j{(^) for each j G {1, . . . , ni}. 
Without loss of generality, assume that <pi^i, . . . , <pi^k (for < k < rii) are all the relational 
atomic FO formulas in qi. For each j G {1, . . . ,k}, there is an instance Tj G T with Tj \= 
ipij{a). Let 7^' := {Ti, . . . ,Tjt} C 7". Then |J7^' ^ (^j.j(a) for each j G {1, . . . , A;}. To obtain 
a set To C 7" that satisfies IJ "To |= 9i(f)) we extend Tq as follows. Let j G {/c + 1, . . . , rii}. 
Then there are at most max{ar(i?) | i? G r} values that occur in ipij{a). In particular, we 
can pick max{ar(i?) | i? G r} instances from T that contain all these values. Add those 
instances to Tq. The resulting set To is a subset of T, and satisfies IjTo \= Qi{t), since 
To \= ipi_j{a) for each j G {1, . . . , k}. Furthermore, To contains at most s instances. 

Note also that to find a nonempty finite set T of minimal instances in poss(Core(M, S)) 
with \T\ < s and IJ"^ N Qii^i it suffices to consider valuations v of Core(M, 5) with range 
in C, where C contains all constants in Core(iVf, S), all constants in qi, all constants in t, and 
all constants in {ci, . . . , Cg.k}, where k is the number of nulls in Core(M, S), and ci, . . . , Cg.k 
is a sequence of pairwise distinct constants that do not occur in Core(M, S), qi and i. 

Finally, it is easy for a Turing machine to check whether a given T G poss(CoTe{M, S)) 
is minimal. For each atom ^4 G T, it just has to check that the instance T \ {A} is not a 
solution for S under M. 

Altogether, given a source instance S for M, and a tuple t G Const'^', a nondeterministic 
Turing machine can check whether t ^ ceriocWA* (<?) M, S). This proves EvaLgcwa* (M, q) G 
co-NP, and in particular. Proposition 16.41 
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7. Conclusion 

A new semantics, called GCWA*" -semantics, for answering non-monotonic queries in re- 
lational data exchange has been proposed. The GCWA*-seniantics is inspired by non- 
monotonic query answering semantics from the area of deductive databases, where the 
problem of answering non-monotonic queries has been studied extensively since the late 
seventies. In contrast to non-monotonic query answering semantics proposed earlier in the 
data exchange literature, the GCWA*-semantics can be applied to a broader class of schema 
mappings (not just schema mappings defined by tgds and egds), and possesses the following 
natural properties: (1) it is invariant under logically equivalent schema mappings, and (2) it 
interprets existential quantifiers "inclusively" as explained in Section [3j Furthermore, under 
schema mappings defined by st-tgds and egds (and even more general schema mappings like 
schema mappings defined by right-monotonic Loo^-st-tgds), the answers to a query under 
the GCWA*-semantics can be defined as the certain answers to the query with respect to all 
ground solutions that are unions of minimal solutions. 

However, the GCWA*-semantics is not meant to be a replacement for earlier semantics 
proposed in the data exchange literature. Each of the earlier semantics is interesting in its 
own right. In fact, I think that there is no ultimate semantics for answering non-monotonic 
queries in relational data exchange. Depending on the concrete application, and the user's 
expectations, one or the other of the proposed semantics may be appropriate. Nevertheless, 
query answers under the GCWA*-semantics seem to be very natural ~ especially due to the 
two properties mentioned above. 

We have shown that the problem of answering non-monotonic queries under the GCWA*- 
semantics can be hard, or even undecidable, in considerably simple settings. Unfortunately, 
this is true not only for the GCWA*-semantics, but also for earlier semantics. This seems to 
be the price that one has to pay for automatically inferring "negative data". Nevertheless, 
we were able to show (Theorem 16.6^ that for schema mappings M defined by packed st-tgds, 
and for universal queries q, there is a polynomial time algorithm that, given the core solution 
for some source instance S for M as input, outputs the set of answers to q with respect to 
M and S under the GCWA*-semantics. 

Quite a number of interesting research problems remain open. First, I believe that the 
techniques used for proving Theorem 16.61 can be extended to prove the analogous result for 
the more general case of schema mappings defined by st-tgds. In fact, it seems that all that 
has to be done is to provide a proof of Lemma 16.231 for the case that the blocks of T are 
not packed. Second, a lot of more work has to be done for understanding the complexity of 
answering non-monotonic queries not only under the GCWA*-semantics, but also under the 
semantics proposed earlier. The fact that for some schema mappings M defined by st-tgds, 
and for some existential queries q the data complexity of computing the GCWA*-answers to 
q under M is hard does not imply that it could not be in polynomial time for other schema 
mappings defined by st-tgds and other existential queries. Third, we only considered the 
data complexity of evaluating queries - we did not consider the combined complexity, where 
the schema mapping and the query to be answered belong to the input. Finally, instead 
of answering queries under a non-monotonic semantics, it could be an interesting task to 
study the problem of answering queries using the OWA-semantics, but allow more expressive 
constraints to explicitly exclude "unwanted" tuples from solutions (rather than implicitly by 
a variant of the CWA). For instance, instead of using the st-tgd 9 in Example I Llj we could 
have used \/x\/y (^R{x,y) -f^ i?'(x,y)). Then, under the OWA-semantics, the answer to a 
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query would be as desired. However, this approach requires schema mappings to be fuhy 
specified. 
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